-
Type: Task
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Sharding
-
Cluster Scalability
Currently, the resharding_donor_service_test.cpp and resharding_recipient_service_test.cpp have stepdown tests that rely on mocking the coordinator's state changes to make progress.
Issue: The loop iterating over which states to pause and then stepdown upon mocks incorrect behavior - it can mock a coordinator's transition to kBlockingWrites before the donor itself is in kDonatingOplogEntries.
Example:
. state = DonorStateEnum::kDonatingOplogEntries in the test loop
. at the start of this iteration, ReshardingDonorDocument.state = kDonatingInitialData on disk
. the test mocks the coordinator's transition into kApplying
. the donor finishes up all the necessary work and is ready to transition to 'state' kDonatingOplogEntries
. the test waits until the OpObserverForTest::onUpdate witnesses attempt to transition to kDonatingOplogEntries
. the test calls stepDown, causing the OpObserverForTest::onUpdate to throw when the opCtx is interrupted and the write ReshardingDonorDocument.state = kDonatingOplogEntries fails
. (next iteration) state = DonorStateEnum::kBlockingWrites, but ReshardingDonorDocument.state = kDonatingInitialData still
. The test mocks the coordinator's transition to kBlockingWrites, before the donor is in kDonatingOplogEntries, which is illegal in the real system
Note:
We want to preserve the behavior that the stepdown occurs before the participant persists its new state to its local ReshardingDocument.