When a primary is stepping down, when calling _stepDownFinish it kills all sessions through invalidateSessionsForStepDown -> killSessionsAction -> checkOutSessionForKill. This is after it has grabbed the RSTL via AutoGetRstlForStepUpStepDown.
In the process of stepping down, if it is trying to kill an already checked out session, there is a potential for deadlock, as it needs to wait until the session is checked back in.
The session will end up getting interrupted when it tries to grab a lock such as the GlobalLock. However, if the session's opCtx is marked as uninterruptible, then it is possible that the checked out session is waiting on the GlobalLock, while the step down thread (which has the RSTL) is waiting on the checked out session, causing a deadlock.
This is possible when profiling. In general, it may be possible with other uses of the UninterruptibleLockGuard.
- causes
-
SERVER-63143 Operation can be interrupted by maxTimeMS timeout while waiting for lock even if _ignoreInterruptsExceptForReplStateChange is set
- Closed
- is related to
-
SERVER-57756 Race between concurrent stepdowns and applying transaction oplog entry
- Closed
- related to
-
SERVER-60161 Deadlock between config server stepdown and _configsvrRenameCollectionMetadata command
- Closed
-
SERVER-59673 Investigate better solutions for fixing the deadlock issue in profiling
- Closed