Note: description is based on v4.0 code, code organization in current master changed a little bit, but the story is the same.
1. Shard0 does retryable write on stmt: 0.
2. Shard0 migrates chunk to Shard1.
3. History of stmt: 0 gets transferred to Shard1.
4. Time passes such that history of stmt: 0 in the oplog gets rolled over.
5. Shard0 migrates chunk to Shard1 again. (Note: to trigger this bug, shard0 should not have rolled over the oplog yet!)
6. Shard1 checks if stmt is already executed.
7. Shard1 has stmt already in the cached map, but when it tries to retrieve the actual oplog, it'll realize that the oplog was already truncated and throw IncompleteTransactionHistory.
8. The exception gets caught but it lets it through as an attempt to "repair/recover" lost history.
9. However, after it inserts the oplog entry and the commit callback gets executed, it will find out that the stmt is already in the map and triggers the fassert.
- is duplicated by
-
SERVER-40324 sharded cluster backtraces with errormessage "XXX was committed once with opTime YYY and a second time with opTime ZZZ
- Closed