Issue
In case there is at least one huge chunks on a shard being drained, the balancer may end up indefinitely in the following scenario:
- The migration proceeds for 6 hours before being aborted
- The same migration is rescheduled
Technical description
When draining a shard, migrations are being scheduled by the balancer with the forceJumbo flag set to true (meaning they can proceed no matter the number of documents to clone) and by passing the whole chunk entry as argument (meaning that the whole chunk must be migrated in one shot).
This is different from the usual balancing behavior that - after the removal of the auto-splitter in 6.0.3 - consists in issuing moveRange commands by only specifying the min bound so that the shard autonomously decides on which key to chop a chunk according to the configured chunk size.
- is caused by
-
SERVER-71787 Balancer needs to attach forceJumbo to `moveRange` command
- Closed