-
Type: Task
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Sharding
-
Fully Compatible
-
Sharding 2021-04-05
-
2
Task executors are allowed to refuse work and the .onCompletion() continuation won't run if the task executor has been shut down. This is especially problematic for the ReshardingCollectionCloner after the changes from SERVER-54959 because the noCursorTimeout cursor will be permanently leaked on stepdown. We should instead be using the RecipientStateMachine::getInstanceCleanupExecutor() to run the .onCompletion() continuation.
- https://github.com/mongodb/mongo/blob/612a3725d98381bf9c0777bcd6b2169cae33f4d1/src/mongo/db/s/resharding/resharding_collection_cloner.cpp#L458
- https://github.com/mongodb/mongo/blob/612a3725d98381bf9c0777bcd6b2169cae33f4d1/src/mongo/db/s/resharding/resharding_txn_cloner.cpp#L398
ReshardingCollectionCloner::run() and ReshardingTxnCloner::run() should be changed to additionally accept the cleanup task executor and should return a SemiFuture<void> so the caller must explicitly do .thenRunOn(**executor) to chain any further continuations.
.on(executor, cancelToken) .thenRunOn(cleanupExecutor) .onCompletion([chainCtx](Status status) { if (chainCtx->pipeline) { // Use a separate Client to make a better effort of calling dispose() even when the // CancelationToken has been canceled. auto serviceContext = cc().getServiceContext(); auto clientStrand = ClientStrand::make( serviceContext->makeClient("ReshardingCollectionClonerCleanup")); auto clientGuard = clientStrand->bind(); auto opCtx = clientGuard->makeOperationContext(); chainCtx->pipeline->dispose(opCtx.get()); chainCtx->pipeline.reset(); } return status; }) .semi();