-
Type: Bug
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Catalog and Routing
-
ALL
-
CAR Team 2024-08-05
As described for a particular case in SERVER-87927, we have a race possible where a movePrimary interleaves with the setFeatureCompatibilityVersion upgrade / downgrade checks & cleanup actions and bypasses them. To break this down:
1. Let's say we have a collection living on Shard A that would trigger the checks in _userCollectionsUassertsForDowngrade(...).
2. An FCV downgrade command is received, and the config server tells both shards to start downgrading.
3. Shard A and Shard B both finish reaching the "transitioning" state.
4. A movePrimary operation starts to copy the problematic collection from Shard A to Shard B.
5. The movePrimary runs before Shard A performs the user collection checks, but after Shard B has already completed them. Therefore, the collection is copied over from Shard A to Shard B and isn't caught on Shard B's end, since Shard B completed them before the collection was copied over.
6. After the movePrimary, Shard A runs the user collection checks. But there's nothing to check for (the collection was already migrated), so it passes.
7. Therefore the FCV downgrade completes without triggering any checks.
We may have to rethink when / where we call _userCollectionsUassertsForDowngrade and _internalServerCleanupForDowngrade, or maybe if we need to disallow migrations while we're in the "transitioning" FCV state. While the example above uses movePrimary, as a part of this ticket we should make sure such a bug isn't possible with the other migration methods we have like moveChunk or resharding.
This bug was found when I added code to the _internalServerCleanupForDowngrade() function.
And it seems like SERVER-87297 is a particular instance of this bug.
- duplicates
-
SERVER-91702 Removal of recordIdsReplicated leaves inconsistent metadata on downgrade for sharded clusters
- Backlog
- is related to
-
SERVER-91702 Removal of recordIdsReplicated leaves inconsistent metadata on downgrade for sharded clusters
- Backlog
-
SERVER-87927 movePrimary + FCV downgrade race could potentially result in timeseriesBucketingParametersHaveChanged existing on 7.0
- Closed
-
SERVER-88238 checkMetadataConsistency interleaves with collMod during upgrade / downgrade
- Closed
- related to
-
SERVER-90094 collMods occurring as part of setFCV may lose some collections in sharded clusters
- Closed
-
SERVER-89634 Enable tests that concurrently perform DDL and setFCV operations on the recordIdsReplicated:true variant
- Closed