The config.rangeDeletions collection stores a document of the following form to track range deletion tasks needing to be performed:
{ "_id" : UUID("78447b8a-84d6-4555-a97d-b4bc2d708e29"), "nss" : "test4_fsmdb0.fsmcoll0_29", "collectionUuid" : UUID("1eafdb97-2114-4b72-a6ce-34e96aa3df1a"), "donorShardId" : "shard-rs1", "range" : { "min" : { "a" : 400 }, "max" : { "a" : { "$maxKey" : 1 } } }, "whenToClean" : "delayed" }
The "range" field of this document is left unmodified after a user has successfully run the refineCollectionShardKey command. This means if the following sequence of events occurs, then a "RangeOverlapConflict: Requested deletion range overlaps a live shard chunk" error will prevent the range deleter from ever deleting the range of orphan documents.
- The test4_fsmdb0.fsmcoll0_29 collection is sharded on {a: 1} and has chunks
- {a: MinKey} -> {a: 100}
- {a: 100} -> {a: 200}
- {a: 200} -> {a: 300}
- {a: 300} -> {a: 400}
- {a: 400} -> {a: MaxKey}
- A migration begins for chunk {a: 400} -> {a: MaxKey} from shard-rs1 to shard-rs0.
- Migration completes successfully but the primary of shard-rs1 steps down before the range deletion task completes.
- User runs the refineCollectionShardKey and changes the shard key of the test4_fsmdb0.fsmcoll0_29 collection to {a: 1, b: 1}. The chunks are therefore augmented to be
- {a: MinKey, b: MinKey} -> {a: 100, b: MinKey}
- {a: 100, b: MinKey} -> {a: 200, b: MinKey}
- {a: 200, b: MinKey} -> {a: 300, b: MinKey}
- {a: 300, b: MinKey} -> {a: 400, b: MinKey}
- {a: 400, b: MinKey} -> {a: MaxKey, b: MaxKey}
- The newly elected primary of shard-rs1 attempts to schedule the range deletion task still present in its config.rangeDeletions collection. Since the {a: 300, b: MinKey} -> {a: 400, b: MinKey} range is considered to partially overlap with the range {a: 400} -> {a: MaxKey} because {a: 400} < {a: 400, b: MinKey}, if shard-rs1 happens to own that range then it'll refuse to perform the range deletion for {a: 400, b: MinKey} -> {a: MaxKey, b: MaxKey}.
- is depended on by
-
SERVER-48198 Migration recovery may recover incorrect decision after shard key refine
- Closed
- is related to
-
SERVER-46370 Correctly maintain receiving chunks list after shard key refine
- Closed
-
SERVER-42192 Write a concurrency workload to test that orphaned ranges are always deleted and nothing that shouldn’t be deleted gets deleted
- Closed