Here are the steps to reproduce the deadlock:
- Run a cross-shard transaction with two participant shards, shard0 and shard1 where shard0 is the coordinator shard. Pause the TransactionCoordinator thread right before the commit decision is written (i.e. after the transaction has entered the "prepared" state).
- Run a setFCV command against shard0. Wait until the setFCV thread is blocked waiting to acquire the global S lock (i.e. waiting for prepared transactions that existed before the FCV change to commit or abort).
- Unpause the TransactionCoordinator thread. The transaction cannot commit since the TransactionCoordinator is blocked waiting to acquire the IX lock for the config.transaction_coordinators collection to write the commit decision.
- Both the setFCV thread and TransactionCoordinator thread now hang.
- causes
-
SERVER-75205 Deadlock between stepdown and restoring locks after yielding when all read tickets exhausted
- Closed
- is related to
-
SERVER-60682 TransactionCoordinator may block acquiring WiredTiger write ticket to persist its decision, prolonging transactions being in the prepared state
- Closed
-
SERVER-57476 Operation may block on prepare conflict while holding oplog slot, stalling replication indefinitely
- Closed
-
SERVER-66340 Improve distributed transaction commit locking behavior
- Closed
-
SERVER-66341 Improve journal flusher locking behavior
- Closed
-
SERVER-66342 Remove resourceIdFeatureCompatibilityVersion
- Closed
- related to
-
SERVER-66719 dbCheck FCV lock upgrade causes deadlock with setFCV
- Closed
-
SERVER-66213 setFCV may need to wait for transactionLifetimeLimitSeconds
- Open