For the non-resumable range deleter protocol, shards track chunks currently being received in a migration in an in-memory map, removing the range when the migration succeeds or fails. There are at least two places where a refine during a migration (which can only happen if a migration runs without the distributed lock) can lead a range to incorrectly remain in this list after the migration aborts:
- When setting new filtering metadata in the MetadataManager, we clear entries from the receiving chunks list that overlap with the new metadata. This comparison goes through the ChunkManager, which uses key strings to compare ranges, which doesn't work correctly for ranges with different numbers of fields, like after a refine.
- In MigrationDestinationManager::_forgetReceive(), we don't remove a chunk from the receiving range list if the epoch has changed. If there's a refine during the migration, this won't be correct and we may fail to clear the received range from receiving chunks.
- related to
-
SERVER-46386 Refining a shard key may lead to an orphan range never being cleaned up
- Closed