-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: 5.0.0, 5.1.0
-
Component/s: Sharding
-
Fully Compatible
-
ALL
-
v5.1, v5.0
-
Sharding 2021-11-29
-
1
The ReshardingCoordinator calls ReshardingMetrics::onCompletion() within its resharding::WithAutomaticRetry blocks
- https://github.com/mongodb/mongo/blob/b4517954a706b9f49b17d423f179113aa8632565/src/mongo/db/s/resharding/resharding_coordinator_service.cpp#L1360
- https://github.com/mongodb/mongo/blob/b4517954a706b9f49b17d423f179113aa8632565/src/mongo/db/s/resharding/resharding_coordinator_service.cpp#L1778
SERVER-56923 had commented out an invariant related to ReshardingMetrics::onCompletion() being called multiple times. Until the TODO comment can be addressed by refactoring to ReshardingMetrics class altogether, we should make ReshardingMetrics::onCompletion() itself safe to be called multiple times.
{"t":{"$date":"2021-11-14T15:37:37.617+00:00"},"s":"F", "c":"CONTROL", "id":4757800, "ctx":"ReshardingCoordinatorService-2","msg":"Writing fatal message","attr":{"message":"Invalid access at address: 0x2d0"}} {"t":{"$date":"2021-11-14T15:37:37.617+00:00"},"s":"F", "c":"CONTROL", "id":4757800, "ctx":"ReshardingCoordinatorService-2","msg":"Writing fatal message","attr":{"message":"Got signal: 11 (Segmentation fault).\n"}}
(gdb) bt #0 mongo::ReshardingMetrics::onCompletion (this=0x557fbfb95b00, role=role@entry=mongo::ReshardingMetrics::kCoordinator, status=mongo::ReshardingOperationStatusEnum::kCanceled, runningOperationEndTime=...) at src/third_party/boost/boost/optional/optional.hpp:1453 #1 0x00007fd16d7cb97a in mongo::markCompleted (status=...) at src/mongo/db/s/resharding/resharding_coordinator_service.cpp:1023 #2 0x00007fd16d7e8df0 in mongo::ReshardingCoordinatorService::ReshardingCoordinator::<lambda(const auto:58&)>::operator()<std::vector<mongo::ReshardingCoordinatorDocument, std::allocator<mongo::ReshardingCoordinatorDocument> > >(const std::vector<mongo::ReshardingCoordinatorDocument, std::allocator<mongo::ReshardingCoordinatorDocument> > &) const (__closure=<optimized out>, coordinatorDocsChangedOnDisk=...) at /opt/mongodbtoolchain/revisions/ba5f698948588cb5da922d3cadee990f5f9f48cd/stow/gcc-v3.pPo/include/c++/8.5.0/bits/atomic_base.h:512 #3 0x00007fd16d7fe9a5 in mongo::unique_function<void(std::vector<mongo::ReshardingCoordinatorDocument, std::allocator<mongo::ReshardingCoordinatorDocument> >)>::callRegularVoid<mongo::ReshardingCoordinatorService::ReshardingCoordinator::_awaitAllParticipantShardsDone(const std::shared_ptr<mongo::executor::ScopedTaskExecutor>&)::<lambda(const auto:58&)> > (args#0=..., f=..., isVoid=...) at src/mongo/util/functional.h:158 #4 mongo::unique_function<void(std::vector<mongo::ReshardingCoordinatorDocument, std::allocator<mongo::ReshardingCoordinatorDocument> >)>::SpecificImpl::call (args#0=..., this=<optimized out>) at src/mongo/util/functional.h:159 #5 mongo::unique_function<void (std::vector<mongo::ReshardingCoordinatorDocument, std::allocator<mongo::ReshardingCoordinatorDocument> >)>::operator()(std::vector<mongo::ReshardingCoordinatorDocument, std::allocator<mongo::ReshardingCoordinatorDocument> >) const (args#0=..., this=<optimized out>) at src/mongo/util/functional.h:109
- is caused by
-
SERVER-56739 Rewrite resharding metrics duration component to allow for resuming from stepup
- Closed
-
SERVER-57153 Support co-existing donors/recipients in resharding metrics
- Closed
- is related to
-
SERVER-56923 Temporarily comment out resharding metrics invariant to allow for metrics-breaking changes to go in until metrics are compatible with stepUp/stepDown
- Closed
- related to
-
SERVER-61483 Resharding coordinator fails to recover abort decision on step-up, attempts to commit operation as success, leading to data inconsistency
- Closed