-
Type: Task
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Replication
-
Repl 2023-04-17, Repl 2023-05-01, Repl 2023-05-15
-
135
Right now the killop thread currently kills operations that took the global lock in a mode conflicting with writes. We did not kill operations that held the RSTL, because at the time we added the kill op thread, reads held the RSTL (this is safe because long running reads would periodically yield). This gave a better user experience because otherwise readers would have to handle interruption during failovers.
After lock free reads, many reads no longer take the RSTL. So, we should be able to start killing operations that take the RSTL on stepdown.
This has the benefit of preventing future deadlocks in situations where threads take the global lock in IS mode while implicitly also taking the RSTL, but are blocked waiting on a DB S mode lock that conflicts with a prepared transaction. The prepared transaction would be blocked from committing if the node was trying to stepdown, but couldn't acquire the RSTL due to the reader thread already holding the RSTL.
This work also might fix deadlocks of this nature that are already possible that we haven't noticed yet. However, I'm not yet sure what complications/side effects making this change would introduce.
- is depended on by
-
SERVER-91733 Remove the use of UninterruptibleLockGuard in ReplicationCoordinatorExternalState
- Open
- is related to
-
SERVER-75285 Deadlock between ShardsvrCheckMetadataConsistencyParticipantCommand, prepared transactions, and stepdown
- Closed
-
SERVER-78662 Deadlock with index build, step down, prepared transaction, and MODE_IS coll lock
- Closed
-
SERVER-71198 Assert that unkillable operations that take X collection locks do not hold the RSTL
- Backlog