A race condition in an aggregation that is in a transaction that causes a cursor to be killed can cause a transaction to commit without including all added participants. Specifically, when `AsyncResultsMerger` (ARM) sends a getMore request to a shard, and then subsequently the cursor is killed (for example a results $limit is reached) and the transaction committed, ARM does not wait for any response from the shards. This causes a problem for the support for a participant to add more participants to a transaction, since the added participants may not be propagated to the transaction coordinator before the transaction commits. This causes the added participants to maintain their transaction resources until the transaction is aborted by the start of a subsequent transaction on the session or by transaction timeout.
The triggering of this race condition does not cause a correctness issue as long as the cursor is killed and getMore does not involve and does not trigger any writes (and this restriction is currently in place). However, if the aggregation involves getMore's triggering writes, then this race condition could cause unspecified incorrect behavior due to improper transaction commit (and there is an unscheduled PM-1247 for adding write-capability to aggregation-in-transactions).
This potential bug is not resolved. To prevent any release from including this bug, this ticket is to add an invariant to cause the server to immediately fail if a write is attempted as part of an aggregation pipeline with async getMore within a transaction.
More information here: https://docs.google.com/document/d/1Czt5q5VrTx3mB7rHRVKvZmMjr8hZeH8T2as5DLVOdGQ/edit?usp=sharing
- related to
-
SERVER-90441 Ensure transaction participants which are outstanding to top-level TransactionRouter only did reads at time of prepare
- Backlog