A migration recipient may start cloning data that overlaps with an ongoing range deletion if the filtering metadata was cleared before starting receiving the migration.
1. Consider we have an ongoing range deletion (e.g we donated a chunk).
2. For whatever reason (e.g. a failed metadata refresh), the filtering metadata gets cleared.
3. Now we start receiving a chunk that overlaps that range deletion (i.e. the same chunk we recently donated away)
4. MigrationDestinationManager will see that there is an existing overlapping range deletion document, so it will attempt to wait for the rangeDeletion task to finish through the CSR. However, because the metadata was cleared on step (2), the current metadata is not aware of that range deletion. So 'waitForClean' will return OK right away.
5. So MigrationDestinationManager will begin cloning documents, even though the range deletion is ongoing and may delete them. Thus causing data loss.
This regression was introduced on SERVER-52906 because this 'while' was changed to an 'if'
- backported by
-
SERVER-66433 Backport deadline waiting for overlapping range deletion to finish to pre-v5.1 versions
- Closed
- is caused by
-
SERVER-52906 moveChunk after failed migration that rolled back cloning indexes can hang indefinitely due to missing shard key index
- Closed