-
Type: Bug
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Sharding
-
None
-
Sharding NYC
-
ALL
-
2
The abortReshardCollection command triggers a shard to refresh using the _flushReshardingStateChange command. The _flushReshardingStateChange command first acquires a database and collection lock to check whether the critical section is held and again acquires these locks as part of onShardVersionMismatch() if the critical section wasn't held. These lock acquisitions can block if the shard has enqueued a strong lock. However, writes being stalled by the strong lock may be the motivation for the user having run abortReshardCollection in the first place. The abortReshardCollection command waiting for a strong lock request to be granted + released means an end-user would need to additionally run killOp on operations from internal (system) threads to have the server make forward progress, which undermines the utility of the abortReshardCollection command.
We should instead have an explicit {_shardsvrAbortReshardCollection: <reshardingUUID>} command that interacts with the DonorStateMachines and RecipientStateMachines directly. Note that the coordinator's decision is irreversible so 'pushing' out the decision as opposed to having the participant shards 'pulling' it via a shard version refresh is still safe in presence of delayed messages.
- duplicates
-
SERVER-56638 Fix flushReshardingStateChanges critical section race
- Closed
- is related to
-
SERVER-53258 [Resharding] Reject writes in opObserver when disallowWritesForResharding is true
- Closed
-
SERVER-54474 Introduce the _flushReshardingStateChange command
- Closed