-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Sharding
-
Sharding EMEA
-
Fully Compatible
-
ALL
-
v4.4
-
Sharding 2020-04-06, Sharding 2020-04-20, Sharding 2020-05-04, Sharding 2020-05-18, Sharding 2020-07-13, Sharding 2020-06-01, Sharding 2020-06-15, Sharding 2020-06-29, Sharding 2020-07-27, Sharding 2020-08-24
When the resumable range deleter is disabled, the recipient of a chunk starts by removing potentially orphaned documents. After that, it clones necessary indexes from the donor.
However, the range deleter relies on the shard key index in order to perform deletions.
This can lead to the following scenario:
1. A moveChunk begins
2. The shard key is refined
3. The moveChunk fails on the recipient for some reason, causing the entire moveChunk to fail
4. The moveChunk is restarted, now with a refined shard key
5. The recipient of the moveChunk attempts to delete the incoming range using the range deleter with the refined shard key
6. The range deleter loops infinitely because it is unable to find a shard key index.
There may be less convoluted scenarios that could cause this as well but I'm having trouble thinking of one.
Repro attached.
- depends on
-
SERVER-69768 Include key pattern in range deletion task documents
- Closed
- is related to
-
SERVER-52906 moveChunk after failed migration that rolled back cloning indexes can hang indefinitely due to missing shard key index
- Closed
- related to
-
SERVER-79632 Stop range deletion when hashed shard key index does not exist
- Closed