-
Type: Task
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Sharding
-
Fully Compatible
-
Sharding 2021-04-05, Sharding EMEA 2021-05-03
If we are creating a collection the following scenario might happen:
- We start sharding an empty collection with a hashed key index
- There is a stepdown
- A write sneaks in before resuming the coordinator
- The new primary starts the phase two, but we cannot shard the collection with a hashed index that already has data so the shard collection fails
In order to ensure there is no leftover data at command termination, a drop index was added to the create collection path that will be executed only if the index was not already on the collection and we are recovering from a step down, however, this might generate the following scenario:
Imagine we have a 10TB collection and we try to shard the collection:
- The request would block for 15 min in the critical section to create an index
- The server is overwhelmed with parked network requests which block on the CS and crashes
- The server comes back up and resumes shard collection, same thing happens over and over, because we drop the index and then recreate it again.
We must remove this drop index operation, in order to do so, we could, for example, use the same approach of resharding, by preventing writes on the collections on step up.
- causes
-
SERVER-56342 Throw an exception if the update on config.collections of the shardCollection operation doesn't suceed
- Closed
- depends on
-
SERVER-54587 Make create collection resilient to stepdowns
- Closed
-
SERVER-55494 Retake the collection critical section on step up (in drain mode)
- Closed
- is depended on by
-
SERVER-55969 Stop checking for DisableIncompleteShardingDDLSupport future flag in create collection
- Closed
- is duplicated by
-
SERVER-55485 Add create collection parameters when reporting for current op
- Closed