-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: 6.0.0-rc10
-
Component/s: Sharding
-
Fully Compatible
-
ALL
-
v6.0
-
Sharding 2022-07-25
-
160
SERVER-67016 made it so when the outer OperationContext is interrupted (e.g. due to a stepdown), then the cancellation token used by the transaction API will be canceled. This is important to ensure the operations running within the internal transaction are themselves eventually interrupted. However, canceling the cancellation source isn't sufficient to ensure the tasks running on the transaction API's executor have actually completely finished running. This means it is still possible for the outer OperationContext to be interrupted, for the cancellation source to be canceled, but for SyncTransactionWithRetries::runNoThrow() to return and the server to destroy the original command request before the task running on the transaction API's executor have drained.
Take the _configsvrRefineCollectionShardKey command for example. The _configsvrRefineCollectionShardKey command calls ShardingCatalogManager::refineCollectionShardKey() using a ShardKeyPattern with an underlying BSONObj which has its lifetime bound to the _configsvrRefineCollectionShardKey command request. If the OperationContext of the refineCollectionShardKey is interrupted (e.g. via the killOp command), then a task running in the task running on the transaction API's executor may continue to refer to the underlying BSONObj's memory even after it has been released.
Instead, the transaction API should additionally wait for the tasks running on the transaction API's executor to have all settled after canceling the cancellation source. This way none of the captures of the lambda callback may still be in use after SyncTransactionWithRetries::runNoThrow() has returned up the stack.
auto txnFuture = _txn->run(std::move(callback)) auto txnResult = txnFuture.getNoThrow(opCtx); // Cancel the source to guarantee the transaction will terminate if our opCtx was interrupted. _source.cancel(); txnFuture.wait()
- is related to
-
SERVER-55813 ReshardingDataReplication may still emplace _consistentButStale more than once
- Closed
-
SERVER-67016 Transaction API transactions should be interrupted if their caller is
- Closed
- related to
-
SERVER-68237 Internal session pool may not reuse its session
- Closed