Taking this lock to check the state of the backupCursor has the unintentional side effect of blocking in flight serverStatus commands which creates gaps in FTDC data.
I have written a reproducer that deterministically reproduces a 10 second gap in the FTDC cursor waiting on this mutex.
I propose that instead of a lock we read it with an atomic using whatever the standard memory order is.
This mutex was added in SERVER-37662 which added backupCursor status to FTDC data. A second commit to that ticket reads: "Fix concurrency for backupCursor state read." originally a data race existed and the lock fixed it.
- related to
-
SERVER-93126 FTDC collection can block on ReplicationCoordinator mutex
- Backlog