-
Type: Improvement
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Sharding
-
Cluster Scalability
-
Fully Compatible
-
v8.0, v7.3, v7.0, v6.0
-
Cluster Scalability Priorities
-
200
Using a default value of 90 will help prevent the following scenarios:
1. Starting in 6.0.3, when a collection is sharded, we only create 1 chunk. The auto-splitter only splits a chunk when it can be moved to another shard. If a collection is resharded when it has only 1 chunk, resharding only asks to create 1 chunk here which leads to all the data being written to one shard.
2. Resharding uses $sample to find numInitialChunks * 10 number of documents to find split points for the collection. In clusters with lower number of numInitialChunks, the number of sampled documents is often not enough to determine a good distribution of the collection. This leads to uneven distribution of data after resharding and chunk migration has to kick in to balance data equally.
3. For clusters with a very large value of numInitialChunks (default is existing number of chunks in the collection), getMore for $sample can give out of memory errors or $sample is unable to find enough unique documents by shard key.
- is related to
-
SERVER-96485 Fix numInitialChunks value for ReshardCollection genny perf test
- Closed
-
SERVER-78841 Make the number of samples per chunk in the SamplingBasedInitialSplitPolicy configurable
- Backlog
-
SERVER-68050 Change resharding split policy to create one chunk per shard by default
- Blocked
- related to
-
SERVER-95773 Resharding does not need to sample documents if the key is hashed
- Backlog