-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: 5.0.0
-
Component/s: Sharding
-
Fully Compatible
-
ALL
-
v5.0
-
Sharding 2021-09-20
-
1
RecipientStateMachine::_runMandatoryCleanup() must wait for _dataReplicationQuiesced to have become ready before calling ReshardingMetrics::onStepDown(). It is otherwise possible for a data replication component (e.g. ReshardingCollectionCloner) to still be running and attempting to tick a ReshardingMetric counter.
ExecutorFuture<void> ReshardingRecipientService::RecipientStateMachine::_runMandatoryCleanup( Status status, const CancellationToken& stepdownToken) { if (stepdownToken.isCanceled()) { // Interrupt occured, ensure the metrics get shut down. _metrics()->onStepDown(ReshardingMetrics::Role::kRecipient); } return _dataReplicationQuiesced.thenRunOn(_recipientService->getInstanceCleanupExecutor()) .onCompletion([this, self = shared_from_this(), outerStatus = status]( Status dataReplicationHaltStatus) { // Wait for all of the data replication components to halt. We ignore any data // replication errors because resharding is known to have failed already. stdx::lock_guard<Latch> lk(_mutex); ensureFulfilledPromise(lk, _completionPromise, outerStatus); return outerStatus; }); }
[js_test:resharding_fuzzer-120e1-1630670493876-3] d20026| 2021-09-03T12:07:52.729+00:00 F ASSERT 23081 [ReshardingRecipientService-2] "Invariant failure","attr":{"expr":"_currentOp","msg":"No operation is in progress","file":"src/mongo/db/s/resharding/resharding_metrics.cpp","line":596} ... [js_test:resharding_fuzzer-120e1-1630670493876-3] d20026| 2021-09-03T12:07:52.988+00:00 I CONTROL 31445 [ReshardingRecipientService-2] "Frame","attr":{"frame":{"a":"55D0EDE6704C","b":"55D0D9F99000","o":"13ECE04C","s":"_ZN5mongo22invariantFailedWithMsgEPKcRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES1_j","s+":"10C"}} [js_test:resharding_fuzzer-120e1-1630670493876-3] d20026| 2021-09-03T12:07:52.988+00:00 I CONTROL 31445 [ReshardingRecipientService-2] "Frame","attr":{"frame":{"a":"55D0EA6D26FB","b":"55D0D9F99000","o":"107396FB","s":"_ZN5mongo17ReshardingMetrics30onCollClonerFillBatchForInsertENS_8DurationISt5ratioILl1ELl1000EEEE","s+":"1AB"}} [js_test:resharding_fuzzer-120e1-1630670493876-3] d20026| 2021-09-03T12:07:52.988+00:00 I CONTROL 31445 [ReshardingRecipientService-2] "Frame","attr":{"frame":{"a":"55D0EA577026","b":"55D0D9F99000","o":"105DE026","s":"_ZN5mongo26ReshardingCollectionCloner10doOneBatchEPNS_16OperationContextERNS_8PipelineE","s+":"E6"}} [js_test:resharding_fuzzer-120e1-1630670493876-3] d20026| 2021-09-03T12:07:52.988+00:00 I CONTROL 31445 [ReshardingRecipientService-2] "Frame","attr":{"frame":{"a":"55D0EA57B0BF","b":"55D0D9F99000","o":"105E20BF","s":"_ZN5mongo19makeReadyFutureWithIRZNS_26ReshardingCollectionCloner3runESt10shared_ptrINS_8executor12TaskExecutorEES5_NS_17CancellationTokenENS_33CancelableOperationContextFactoryEE3$_4Li0EEENS_6FutureINS_14future_details17UnwrappedTypeImplINSt13invoke_resultIOT_JEE4typeEE4typeEEESF_","s+":"DF"}}
- is depended on by
-
SERVER-53351 Add resharding fuzzer task with step-ups enabled for shards
- Closed
- is related to
-
SERVER-56658 Use the cleanup executor to fulfill resharding participant machine completion promises instead of fulfilling in PrimaryOnlyService::interrupt()
- Closed
-
SERVER-57263 Use resharding metrics stepUp/stepDown logic in the recipient state machine
- Closed