-
Type: Question
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: 3.0.5, 3.0.6
-
Component/s: Admin
-
None
We have a large sharded cluster where occasionally a secondary will become very slow and requires a restart to fix. All our reads go to secondaries. When inspecting the log of the secondary, I see that over the course of a couple seconds the number of connections goes up by many thousands. Then it will be filled with slow queries (taking over 100ms), which looks like every query hitting the replica. In normal operation these queries only take a few ms. After that I see lots of this sprinkled between the slow queries:
[conn233831] killcursors keyUpdates:0 writeConflicts:0 numYields:0 locks:{ Global: { acquireCount: { r: 2 }, acquireWaitCount: { r: 1 }, timeAcquiringMicros: { r: 52009 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } 83ms
Putting the replica into maintenance mode (sometimes for many hours) and then putting it back into service does not fix the issue. After putting back into service, the node still continues to serve very slowly. Restarting the mongoD process however does fix the problem. We have experienced this in version 3.0.5 and 3.0.6, with both mmapv1 and wiredTiger storage engines.