-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Replication
-
None
-
Fully Compatible
-
ALL
-
v4.2
-
Repl 2019-07-15, Repl 2019-07-29
Let's assume the number of write tickets available = 1. Consider the below sequence.
1) Transaction gets prepared and waits to commit. Once the prepare succeeds on primary, as a part of stashing the lock resources, we release the ticket but hold the global lock in IX mode.
2) Now, commands (like create, find, insert) not running in transaction comes in and acquires the ticket and global lock but gets blocked behind the prepared txn on a prepare conflict or DB/collection level lock conflict.
3) Next, commitTransaction cmd comes in and as a part of unstashing the lock resources, the commit cmd will try to reacquire the ticket. But, it fails and gets blocked by the non-transactional ops in step no:2
For a cross-shard transactions, the transaction coordinator keeps retrying the commitTransaction cmd until it succeeds. But due to above deadlock, there won't be any progress on the primary. The above deadlock happens on primary because the transaction violates the ordering while unstashing the lock resources where ticket is acquired with the global lock held.
Note: The above is a problem only for a prepared txns ( commitTransaction cmd + cross-shard transaction combo) and not for unprepared txns because the transactions gets aborted either by the transaction reaper or by the higher transaction number (see SERVER-41976) which would allow step no:2 to proceed.
- related to
-
SERVER-41556 Must handle failure to reacquire locks and ticket when unstashing transaction
- Closed
-
SERVER-92292 Skip ticket acquisition for prepareTransaction
- Closed
-
SERVER-42398 abortTransaction and commitTransaction commands should not acquire ticket irrespective of the prepared state.
- Closed