-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: 5.0.0, 6.0.0, 7.0.0, 7.2.0, 8.0.0-rc0, 7.3.0
-
Component/s: None
-
None
-
Cluster Scalability
-
Fully Compatible
-
ALL
-
v8.0, v7.3, v7.0, v6.0, v5.0
-
Cluster Scalability 2024-4-29, Cluster Scalability 2024-5-13, Cluster Scalability 2024-5-27
-
105
When the resharding coordinator aborts, it performs the following steps:
1. Transition the state document to kAbort.
2. Send the _shardsvrAbortReshardCollection to the participants
3. Proceed with cleaning up the resharding temporary collection metadata.
However, by the time (3) executes there's no guarantee that shards will have seen the transition to kAbort (1). This is because (2) only clears the filtering metadata on the primary nodes (and issues a best effort async sharding metadata refresh which in turn will also asynchronously flush the ShardServerCatalogCacheLoader). This can be problematic in case of failover to a new secondary that is not yet aware of kAbort.
One solution could be to make (2) perform this sharding metadata refresh + durably (majority) flush of the shardServerCatalogCacheLoader.
Another solution, which is more in line with other callers of _updateCoordinatorDocStateAndCatalogEntries, would be to call _tellAllDonorsToRefresh() (and _tellAllRecipientsToRefresh() too?) right after this line on the abort procedure.
- is related to
-
SERVER-90810 Resharding recipient shard can install stale filtering information for the resharding temporary collection when aborting
- Backlog