The multi_statement_transaction_kill_sessions_atomicity_isolation.js concurrency workload executes ordered updates in transactions using snapshot isolation and from time to time kills random sessions, finally validating that the transactions still committed in the correct order.
Enabling this workload against a sharded cluster leads to failures which appear as if transactions committed out of order:
Error: [[ ]] != [[ { "tid" : 9, "iteration" : 14, "numUpdated" : 2 }, { "tid" : 8, "iteration" : 6, "numUpdated" : 3 }, { "tid" : 3, "iteration" : 4, "numUpdated" : 5 }, { "tid" : 9, "iteration" : 14, "numUpdated" : 2 }
The reason for these failures is not due to a server bug, but because interrupting a session running 2 phase commit on mongos, may still result in the transaction committing. As a result of this, because the test retries the entire transaction (with exactly the same parameters), the transaction ends up committing twice.
Proposed fix
The way to fix is would be to make withTxnAndAutoRetry retry just the commit, if it fails, similar to what the drivers spec requires, namely:
commitTransaction is a retryable write command. Drivers MUST retry once after commitTransaction fails with a retryable error according to the Retryable Writes Specification, regardless of whether retryWrites is set on the MongoClient or not.
- has to be done before
-
SERVER-40183 Create kill_sessions version of multi_statement_transaction_simple.js concurrency workload
- Closed
- related to
-
SERVER-38297 Killing session on a secondary currently applying prepare oplog entry can fassert
- Closed
-
SERVER-39890 Make network_error_and_txn_override.js retry logic follow the driver spec as much as possible
- Closed