Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Blocker - P1
Fix Version/s: None
Affects Version/s: 1.8.1
Component/s: Performance
Labels:
- performance
Environment:
Linux ip-10-164-19-105 2.6.21.7-2.fc8xen #1 SMP Fri Feb 15 12:34:28 EST 2008 x86_64 x86_64 x86_64 GNU/Linux

EC2 m2.2xlarge instance

Operating System:
Linux

I have very strange behavior of mongo on out Nimble server. We use mongodb to run cache for our product. Cache consists of single capped collection with few indexed fields. Here's the stats for it:

db> db.cache.stats()
{
"ns" : "nimble_cache.cache",
"count" : 6795240,
"size" : 29891215048,
"avgObjSize" : 4398.8461111013,
"storageSize" : 30000000304,
"numExtents" : 15,
"nindexes" : 3,
"lastExtentSize" : 256,
"paddingFactor" : 1,
"flags" : 0,
"totalIndexSize" : 3472837440,
"indexSizes" :

{ "cache_id_-1_created_-1" : 561824704, "depends_on_1_deadline_-1" : 2351080384, "cache_id_1_resource_1_created_-1" : 559932352 }

,
"capped" : 1,
"max" : 2147483647,
"ok" : 1

}

It runs on m2.2xlarge EC2 instance that have 34 GBs of memory. Here's
memory stats of the mongo instance:

> db.serverStatus().mem

{ "bits" : 64, "resident" : 6300, "virtual" : 34749, "supported" : true, "mapped" : 32960 }

The load on server isn't really high (lock percentage is terrible however):
[root@ip-10-164-1-195 ~]# /opt/mongo/bin/mongostat
connected to: 127.0.0.1
insert query update delete getmore command flushes mapped vsize res faults locked % idx miss % qr|qw ar|aw netIn netOut conn time
93 368 89 0 0 86 0 32.2g 33.9g 6.2g 7 39.8 0 5|7 5|8 444k 684k 151 22:56:44
66 302 85 0 0 90 0 32.2g 33.9g 6.2g 3 77.4 0 0|0 0|0 325k 504k 151 22:56:45
47 153 43 0 0 44 0 32.2g 33.9g 6.2g 4 77.1 0 0|0 0|0 327k 386k 151 22:56:46
67 256 70 0 0 66 0 32.2g 33.9g 6.2g 4 35.9 0 6|8 6|9 329k 437k 151 22:56:47
103 384 104 0 0 106 0 32.2g 33.9g 6.2g 5 60.9 0 6|6 6|7 449k 755k 151 22:56:48
56 207 71 0 0 75 0 32.2g 33.9g 6.2g 5 53.3 0 5|6 5|7 390k 623k 151 22:56:49
47 201 40 0 0 39 0 32.2g 33.9g 6.2g 4 78.1 0 4|4 4|5 348k 473k 151 22:56:50
99 381 117 0 0 121 0 32.2g 33.9g 6.2g 5 27.6 0 0|0 0|0 445k 766k 151 22:56:51
78 308 89 0 0 90 0 32.2g 33.9g 6.21g 5 39.3 0 0|0 0|0 461k 657k 151 22:56:52
115 374 122 0 0 120 0 32.2g 33.9g 6.21g 8 4.7 0 3|7 3|8 484k 647k 151 22:56:54

All updates are on non-indexed fields. All queries are using indexes.

The problem I have is very high disk IO activity. That causes that locks level above. Here's the output of iostat:

[root@ip-10-164-1-195 ~]# iostat -xdm /dev/sdk 2
Linux 2.6.21.7-2.fc8xen (ip-10-164-1-195) 06/07/11

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdk 0.05 259.76 83.07 407.36 2.52 5.83 34.86 43.29 88.28 1.63 79.93
sdk 0.00 0.00 6.00 437.00 0.08 1.74 8.42 105.36 260.29 2.26 99.90
sdk 0.00 0.00 6.00 741.00 0.06 3.03 8.47 39.87 67.59 1.32 98.95
sdk 0.00 4.00 25.00 2519.00 0.17 10.07 8.24 80.18 31.24 0.39 98.30
sdk 0.00 25.00 14.00 1530.00 0.14 6.33 8.59 23.92 15.76 0.64 98.10
sdk 0.00 2.50 6.50 422.50 0.07 1.73 8.57 55.19 96.98 2.32 99.60
sdk 0.00 0.50 11.50 135.00 0.11 0.54 9.17 44.41 363.53 6.82 99.95
sdk 0.00 0.00 8.00 104.00 0.08 0.41 8.89 108.89 617.02 8.93 100.05
sdk 0.00 0.00 14.50 270.50 0.13 1.08 8.65 69.87 390.25 3.51 100.00
sdk 0.00 0.00 6.00 154.00 0.05 0.61 8.45 51.68 305.67 6.25 100.05
sdk 0.00 0.00 9.50 204.50 0.07 0.86 8.93 24.03 139.49 4.67 99.90

I have trying disabling journaling or increasing syncdelay parameter (now it's set to 300 seconds). No positive results so far. The one thing I mentioned is that after mongo restart everything works pretty good for about ~10-15 minutes. However after that period of time disk IO starts raising and in few minutes device utilization reaches 100%.

I'm really out of ideas of how to deal with it.

Assignee:: Scott Hernandez (Inactive)

Reporter:: Michael Korbakov

Participants:: Eliot Horowitz, Michael Korbakov, Michael Korbakov, Scott Hernandez

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Created:: Jun 09 2011 07:38:52 PM UTC

Updated:: Jul 12 2016 12:19:11 AM UTC

Resolved:: Sep 02 2011 02:07:29 PM UTC

Details

Description

Attachments

Activity

People

Dates