-
Type: Bug
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Sharding
-
None
-
ALL
-
we have a shardCluster with 6 mongos and 3 shard.
Each mongos use 8 Core controlled by cgroup, and taskExecutorPoolSize is 4
Each shard has 1 primary , 4 secondaries and 1 hidden.
we use ycsb to pressure test it with 48 thread.
Anything goes OK,But there may be steep drop(both CPU&opcounters) appeared in mongos once or twice for a 10min pressure test,
When all things go ok, the CPU like:
S 35.0 0.0 8:35.77 TaskExe.rPool-1
S 35.0 0.0 8:52.79 TaskExe.rPool-2
S 35.0 0.0 8:32.40 TaskExe.rPool-3
S 30.0 0.0 9:18.88 TaskExe.rPool-0
When the steep drop happened,someone TaskExecutor CPU fill up like:
R 99.9 0.0 8:54.62 TaskExe.rPool-2
S 10.0 0.0 9:19.35 TaskExe.rPool-0
S 10.0 0.0 8:36.19 TaskExe.rPool-1
R 5.0 0.0 8:32.80 TaskExe.rPool-3
pstack result for TaskExecutor with CPU filled up:
Thread 85 (Thread 0x7f01c6203700 (LWP 129527)):
#0 0x00007f01cedb86d0 in sha256_block_data_order_avx2 ()
#1 0x00007f01cedb9935 in SHA256_Update ()
#2 0x00007f01ced58817 in HMAC_Init_ex ()
#3 0x00007f01cec04e0f in mongo::SHA256BlockTraits::computeHmac( xxx )
perf top result when CPU full happened: about 7.32% for OPENSSL_cleanse.
There are also slow log appeared in mongos, while secondary's log do not contain slow log.The reason is that the connectionPool in TaskExecutor with CPU filled up has many requests to be sent, the connectionPool stats log is :
Updating controller for host:port with State: { requests: 19, ready: 0, pending: 2, active: 1, isExpired: false }
The request continues to grow, and pending is always 2. I think it just the result of TaskExecutor CPU full
There 2 things worth mentioning
- when we use PSH for every shard ,everything goes ok, mongos CPU just fill up but no steep drop
- we use taskExecutorPoolSize=8, may ycsb 48 threads goes well(also steep drop sometimes), but 96 threads still have problems
How do I solved this problem?
- duplicates
-
SERVER-53853 Large buildup of mongos to mongod connections and low performance with secondaryPreferred reads
- Closed
- related to
-
SERVER-54504 Disable taskExecutorPoolSize for Linux
- Closed