Race:
1. FindAndModify write with txnNumber 10 is executed in shardA
2. Migration of chunk from shardA to shardB starts.
3. Session migration thread pulled oplog for write in step#1 and passed all the checks and about to write oplog here
4. A new retryable write with txnNumber 11 starts and successfully writes to oplog.
5. Session migration thread writes oplog for txnNumber 10. Primary successfully wrote an oplog with higher optime but lower txnNumber.
Consequence:
Secondaries can potentially hit this fassert:
https://github.com/mongodb/mongo/blob/r4.0.15/src/mongo/db/repl/session_update_tracker.cpp#L98
Note: this race is no longer possible in v4.2 because we checkout the session when session migration thread tries to process the oplog entries, so the interleaving is no longer possible.
Here are the conditions to hit to this race:
- running older than v4.2
- using retryable writes with findAndModify
- migrations happening while using retryable write
- duplicates
-
SERVER-44055 All secondary crashed in SessionUpdateTracker and cannot recovery
- Closed