-
Type: Task
-
Resolution: Won't Do
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Cluster Scalability
-
Cluster Scalability 2024-5-13, Cluster Scalability 2024-5-27, Cluster Scalability 2024-6-10, Cluster Scalability 06/24/24, Cluster Scalability 2024-07-08
A race condition in an aggregation that is in a transaction that causes a cursor to be killed can cause a transaction to commit without including all participants that are added by other participants (PM-2844). Specifically, when AsyncResultsMerger (ARM) sends a getMore request to a shard, and then subsequently the cursor is killed (for example a results $limit is reached) and the transaction committed, ARM does not wait for any response from the shards that would indicate that participants have been added. This situation allows the transaction to commit before the response of added participants is propagated to the transaction coordinator. This causes the added participants to maintain their transaction resources until the transaction is aborted by the start of a subsequent transaction with higher txn number or by transaction timeout.
The triggering of this race condition does not cause a correctness issue as long as the getMore does not involve and does not trigger any writes (this restriction is currently in place). This is because the added participant that is errantly omitted from transaction commit is only returning read results and these results would not be seen by the client due to the cursor being killed. However, the added participant will maintain its transaction resources as mentioned above, and there could be a specific load that causes this additional resource overhead to trigger a server failure.
We need to characterize such a workload that triggers a shard to maintain significant transaction resources as to cause performance degradation or failure in the server.
More information: https://docs.google.com/document/d/1Czt5q5VrTx3mB7rHRVKvZmMjr8hZeH8T2as5DLVOdGQ/edit?usp=sharing
- is related to
-
SERVER-89663 $lookup/$graphLookup may leave open some transactions after commit/abort
- Backlog
- related to
-
SERVER-90367 Track the number of times a $lookup appears within a transaction
- Backlog
-
SERVER-90368 Track the number of times a $lookup appears within a transaction and targets a remote shard
- Backlog