-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Sharding
-
Fully Compatible
-
ALL
-
v6.0, v5.0
-
Sharding 2022-06-27, Sharding 2022-07-11
-
163
-
3
The ReshardingTest fixture configures the reshardingPauseCoordinatorBeforeCompletion with {times: 1} which means that it is automatically disabled once it is reached by a ReshardingCoordinator. The failpoint is automatically disabled once it has been reached and therefore won't actually pause the ReshardingCoordinator. This is problematic for cases where the reshardCollection command is expected to error (i.e. tests which use expectedErrorCode !== ErrorCodes.OK) because the _configsvrReshardCollection can be retried by the primary shard and will have forgotten about an earlier aborted resharding. This can lead an entire second resharding operation to run and, because it runs entirely after the duringReshardingFn finished executing, it won't also abort like the first resharding operation.
We should revert the changes to the ReshardingTest fixture from 38c6aff as part of SERVER-52730 so the ReshardingCoordinator remains paused. This will require devising a different solution to not having the resharding_prohibited_commands.js test running a second reshardCollection command get stuck, which can likely be done by passing data into the reshardingPauseCoordinatorBeforeCompletion failpoint to only pause the ReshardingCoordinator for a particular source namespace.
We should also revert the test changes to resharding_nonblocking_coordinator_rebuild.js from SERVER-61607 because I hadn't realized the problematic behavior with the reshardingPauseCoordinatorBeforeCompletion failpoint being the culprit until now.
- is related to
-
SERVER-52730 Restrict there to be at most one resharding operation active in the whole cluster
- Closed
-
SERVER-61607 Accept DuplicateKey as a possible error in resharding_nonblocking_coordinator_rebuild.js
- Closed
- related to
-
SERVER-73916 Improve ReshardingTest fixture error reporting when reshardCollection has already failed before any failpoints are waited on
- Closed