-
Type: Question
-
Resolution: Community Answered
-
Priority: Major - P3
-
None
-
Affects Version/s: 4.2.2
-
Component/s: Write Ops
-
None
-
Fully Compatible
We have a 3 shards cluster, but as soon as we start to load data, the cpu usage goes up to 100%, no other operation goes on during that time, IO is very low, we have pretty much memory (60GB), we're unable to go live because of this issue, any one can help with this?
Here is out conf file:
- cat /data/mongodb/shard1/mongod.conf
systemLog:
destination: file
logAppend: true
logRotate: reopen
path: /data/mongodb/shard1/log/mongod.log
operationProfiling:
slowOpThresholdMs: "10000"
mode: "off"
storage:
dbPath: /data/mongodb/shard1/data
journal:
enabled: true
commitIntervalMs: 500
wiredTiger:
engineConfig:
cacheSizeGB: 60
processManagement:
fork: true
pidFilePath: /data/mongodb/shard1/run/mongod.pid
net:
port: 27001
bindIp: 0.0.0.0
replication:
replSetName: shard1
sharding:
clusterRole: shardsvr
storage.wiredTiger.engineConfig.configString: "eviction_dirty_target=60,eviction_dirty_trigger=80,eviction=(threads_min=1,threads_max=2)"
Here is some mongostat info:
insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn set repl time
*0 *0 *0 *0 0 2|0 0.0% 53.7% 1 35.8G 33.8G 0|0 1|0 859b 9.18k 43 shard1 PRI Mar 28 00:57:34.858
*0 *0 *0 *0 0 2|0 0.0% 53.7% 0 35.8G 33.8G 0|0 1|0 778b 12.3k 43 shard1 PRI Mar 28 00:57:49.859
*0 *0 *0 *0 0 2|0 0.0% 53.7% 0 35.8G 33.8G 0|0 1|0 912b 16.3k 43 shard1 PRI Mar 28 00:58:04.858
*0 *0 *0 *0 0 2|0 0.0% 53.7% 0 35.8G 33.8G 0|0 1|0 1.03k 12.7k 43 shard1 PRI Mar 28 00:58:19.858
*0 *0 *0 *0 0 2|0 0.0% 53.7% 1 35.8G 33.8G 0|0 1|0 625b 9.09k 43 shard1 PRI Mar 28 00:58:34.859
*0 *0 *0 *0 0 2|0 0.0% 53.7% 0 35.8G 33.8G 0|0 1|0 932b 17.3k 43 shard1 PRI Mar 28 00:58:49.858
*0 *0 *0 *0 0 1|0 0.0% 53.7% 0 35.8G 33.8G 0|0 1|0 530b 8.68k 43 shard1 PRI Mar 28 00:59:04.858
*0 *0 *0 *0 0 2|0 0.0% 53.7% 0 35.8G 33.8G 0|0 1|0 1.08k 12.7k 43 shard1 PRI Mar 28 00:59:19.859
*0 *0 *0 *0 0 2|0 0.0% 53.7% 1 35.8G 33.8G 0|0 1|0 593b 13.9k 43 shard1 PRI Mar 28 00:59:34.858
*0 *0 *0 *0 0 2|0 0.0% 53.7% 0 35.8G 33.8G 0|0 1|0 702b 12.2k 43 shard1 PRI Mar 28 00:59:49.858
insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn set repl time
*0 *0 *0 *0 0 1|0 0.0% 53.7% 0 35.8G 33.8G 0|0 1|0 636b 11.2k 43 shard1 PRI Mar 28 01:00:04.858
273 *0 *0 *0 8 9|0 0.2% 53.7% 0 35.8G 33.8G 0|0 1|0 168k 217k 46 shard1 PRI Mar 28 01:00:19.854
9696 *0 *0 *0 337 346|0 4.9% 54.1% 1 35.9G 34.0G 0|0 1|0 5.90m 7.17m 47 shard1 PRI Mar 28 01:00:34.860
10007 *0 *0 *0 322 314|0 10.1% 54.5% 0 36.2G 34.3G 0|0 1|0 6.04m 7.37m 49 shard1 PRI Mar 28 01:00:49.858
10489 *0 *0 *0 350 362|0 15.1% 55.0% 0 36.4G 34.6G 0|0 1|1 6.37m 7.75m 49 shard1 PRI Mar 28 01:01:04.856
10776 *0 *0 *0 348 359|0 20.3% 55.4% 0 36.7G 34.8G 0|0 1|0 6.52m 7.94m 49 shard1 PRI Mar 28 01:01:19.857
7538 *0 *0 *0 205 196|0 23.9% 55.7% 0 36.9G 35.0G 0|0 1|0 4.51m 5.49m 49 shard1 PRI Mar 28 01:01:34.868
4015 *0 *0 *0 36 2|0 25.7% 55.9% 0 37.0G 35.1G 0|0 1|0 2.30m 2.83m 48 shard1 PRI Mar 28 01:01:49.866
3500 *0 *0 *0 34 2|0 27.4% 56.0% 0 37.1G 35.2G 0|0 1|1 2.00m 2.46m 48 shard1 PRI Mar 28 01:02:04.854
3846 *0 *0 *0 35 2|0 29.1% 56.2% 0 37.2G 35.3G 0|0 1|0 2.20m 2.71m 49 shard1 PRI Mar 28 01:02:19.875
insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn set repl time
3445 *0 *0 *0 54 2|0 30.6% 56.3% 0 37.3G 35.4G 0|0 1|0 1.98m 2.45m 48 shard1 PRI Mar 28 01:02:34.858
3867 *0 *0 *0 37 2|0 32.4% 56.5% 0 37.4G 35.5G 0|0 1|0 2.21m 2.71m 48 shard1 PRI Mar 28 01:02:49.857
3795 *0 *0 *0 35 2|0 34.2% 56.6% 0 37.5G 35.6G 0|0 1|0 2.17m 2.67m 48 shard1 PRI Mar 28 01:03:04.880
3882 *0 *0 *0 33 2|0 36.1% 56.8% 0 37.6G 35.7G 0|0 1|0 2.22m 2.74m 47 shard1 PRI Mar 28 01:03:19.868
3784 *0 *0 *0 35 2|0 37.9% 56.9% 0 37.7G 35.8G 0|0 1|1 2.17m 2.65m 47 shard1 PRI Mar 28 01:03:34.864
3791 *0 *0 *0 35 2|0 39.6% 57.1% 0 37.8G 35.9G 0|0 1|0 2.16m 2.66m 47 shard1 PRI Mar 28 01:03:49.875
4070 *0 *0 *0 28 1|0 41.5% 57.3% 0 37.9G 36.0G 0|0 1|0 2.33m 2.86m 47 shard1 PRI Mar 28 01:04:04.869
3992 *0 *0 *0 1 2|0 43.3% 57.4% 0 38.0G 36.1G 0|0 1|0 2.28m 1.22m 47 shard1 PRI Mar 28 01:04:19.866
3679 *0 *0 *0 0 2|0 45.0% 57.6% 0 38.1G 36.2G 0|0 1|0 2.09m 1.15m 48 shard1 PRI Mar 28 01:04:34.854
3857 *0 *0 *0 0 2|0 46.9% 57.7% 0 38.2G 36.3G 0|0 1|0 2.19m 35.6k 48 shard1 PRI Mar 28 01:04:49.857
insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn set repl time
2088 *0 *0 *0 0 2|0 47.8% 57.8% 0 38.3G 36.4G 0|0 1|0 1.19m 1.14m 48 shard1 PRI Mar 28 01:05:04.860
*0 *0 *0 *0 0 2|0 47.7% 57.8% 0 38.3G 36.4G 0|0 1|0 737b 12.1k 48 shard1 PRI Mar 28 01:05:19.876
Here is some iostat info:
avg-cpu: %user %nice %system %iowait %steal %idle
99.32 0.00 0.66 0.00 0.00 0.02
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 13.73 499.00 275.53 22978.40 6031.20 74.91 2.68 3.46 1.00 7.92 0.10 7.67
dm-0 0.00 0.00 0.00 0.13 0.00 0.80 12.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.00 0.00 39.20 0.00 198.93 10.15 0.14 3.67 0.00 3.67 0.07 0.26
dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-4 0.00 0.00 499.00 249.87 22978.40 5830.67 76.94 2.58 3.44 1.00 8.32 0.10 7.53