-
Type: Improvement
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Sharding
-
Cluster Scalability
-
5
When mongos encounters an error in a transaction, if the error is "retryable" (e.g. snapshot error on first client statement), it will remove newly added participants from the participant list and send abortTransaction to each before retrying the failed statement. This guarantees no transactions are left open on these shards if they are not targeted by the retry. To prevent the retry from racing with the aborts, the router must wait for a response to abortTransaction from each cleared participant.
To avoid this delay, it's possible for the router to not abort and instead send the retry immediately after clearing the new participants from the participant list, relying on shards with unaborted transactions from the first attempt to implicitly abort their local transaction before servicing the retry (this behavior would need to be added). To guarantee no transactions are left open, the router should track all participants that were ever targeted, and send abortTransaction to those that were targeted but are not in the final participant list when the transaction reaches a terminal state, i.e. commit, abort, or implicit abort.