-
Type: Improvement
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Replication
-
None
-
Replication
-
Repl 2019-04-22, Repl 2019-05-06, Repl 2019-05-20
Currently, when 2 concurrent step downs are triggered (can be a combination of conditional step down and unconditional step down or 2 conditional step downs), there is a possibility that the step down thread can kill the transaction operations processed by the second oplog application.
Consider the below scenario and assume that node A is in primary state.
1) User executes replSetStepDown cmd (Thread X).
2) Thread X is at this line.
3) Now, node A notices that a new term has begun via heartbeat. So, node A steps down via unconditional stepdown code path.
4) Now the state of node A will be SECONDARY.
5) Node A's oplog application tries to apply the prepare/commit oplog entry. This would require the secondary oplog application to checkout the session. Let assume, oplog application thread Y, tries to apply commit oplog entry and is at this line.
6) Read operations comes in (Thread Z), acquired the RSTL lock in mode IX and global lock in IS mode. And, its blocked by thread Y due to prepare conflict ( conflict at the document lock).
7) Thread X resumes and enqueues the RSTL lock in X mode as it is blocked by read thread (thread Z).
8) Thread X starts "RstlKillOpthread". Now, RstlKillOpthread marks the thread Y(belongs to secondary oplog application) as killed as part of killSessionsAbortUnpreparedTransactions.
- depends on
-
SERVER-37574 Force reconfig should kill user operations
- Closed
- is related to
-
SERVER-37348 TransactionReaper and periodic transaction abort thread shouldn't abort transactions on secondaries
- Closed
- related to
-
SERVER-40700 Deadlock between read prepare conflicts and state transitions
- Closed
- split to
-
SERVER-41283 Add test that running stepdown on secondary does not lead to 3 way deadlock
- Closed