-
Type: Bug
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Sharding NYC
-
ALL
-
While performance testing for SERVER-79056, I noticed throughput for transactions that use two phase commit scales poorly with more concurrent transactions, despite CPU and IO utilization staying low and secondaries keeping up. The problem seems to be the WaitForMajorityService used by two phase commit coordinators to wait for the participant list and decision writes to majority replicate can't keep up with many requests to wait for majority.
When I switch transaction coordinators to either wait for majority write concern as part of the writes themselves (which synchronously blocks a task executor thread) or wait asynchronously using ReplicationCoordinator::awaitReplicationAsyncNoWTimeout, throughput with the same workload goes up significantly (over 4x with my setup) and CPU becomes the bottleneck. I initially saw this in the shard DSI workload with custom 0.3ms network delay, which uses 3 node replica sets, but I reproduced it in a modified shard workload with single node replica sets.
The problem with the WaitForMajorityService seems to be that it waits for only the lowest opTime it's been given in each loop of _periodicallyWaitForMajority(), so if it receives new opTimes faster than it can wait for them, requests queue up and latency increases significantly. I modified the service to get the latest committed snapshot opTime after waiting for majority and pretend that was the most recently waited for time if it is greater than the actually waited on time (using ReplicationCoordinator::getCurrentCommittedSnapshotOpTime), and that seemed to resolve the bottleneck as well.
- duplicates
-
SERVER-79881 Integrate WaitForMajorityService with ReplicationCoordinator
- Open
- is related to
-
SERVER-79881 Integrate WaitForMajorityService with ReplicationCoordinator
- Open