-
Type: Bug
-
Resolution: Won't Fix
-
Priority: Major - P3
-
None
-
Affects Version/s: 5.0.0
-
Component/s: Sharding
-
Cluster Scalability
-
ALL
-
(copied to CRM)
Chunk Migration Concurrency >1 is incompatible with attemptToBalanceJumboChunks. AttemptToBalanceJumboChunks
Context:
If 'attemptToBalanceJumboChunks' is set to true, the balancer will schedule migrations that attempt to move large chunks as long as the chunk is not marked 'jumbo' in config.chunks. A chunk is marked 'jumbo' only after an attempt to split or move a large chunk has failed because of its size or the size of the transfer mods queue.
If a shard is in draining mode, meaning it has been removed, the balancer will also attempt to schedule migrations of any large chunks currently belonging to this shard. The balancer will behave the same as if 'attemptToBalanceJumboChunks' is set to true
The fetch code path for jumboChunks is not thread-safe although setting chunkMigrationConcurrency to greater than 1 doesn't put any restrictions in place for jumboChunks.
https://github.com/mongodb/mongo/blob/master/src/mongo/db/s/migration_chunk_cloner_source.cpp#L680
Additional information here on how attemptToBalanceJumboChunks setting works with Chunk Migration.
Added a reproducer(find attached)
Impact:
The invariant will cause servers to restart.
(But the failed chunkMigration wouldn't corrupt data)
Workaround:
1. Turn off Migration
2. Disable balancing jumbo chunks and still keep the migration on
Note : 6 clusters on Atlas have attemptToBalanceJumboChunks set to true
- is related to
-
SERVER-95324 Make CMConcurrency a no-op.
- Closed