A race condition between the migration of chunks to the secondary shard (and index creation) and the drop index. With help from Max Hirschhorn we theorize the following scenario.
1. client runs build index. Mongos broadcast to all shards.
2. On shard 1, build index completes.
3. On shard 2, with no data present for collection, no index is created nor implicit collection creation.
4. At a later time, a move chunk is initiated from shard 1 to shard 2.
5. At shard 2, chunk is migrated and hence collection exists, but before indexes are created.
6. Drop index is broadcast to both shards. Completes successful on shard 1. But on shard 2 returns IndexNotFound.
7. Indexes are created on collection on shard 2.
A thought on possible solution.
1. Instead of returning IndexNotFound, return another error code that would allow the mongos to trigger a retry in this scenario.
2. Have dropIndexes block until collection "cloning" completes on secondary shard.
Backlog - Sharding Team I'm going to pass this on to you guys to have a look at the possible solutions.
- is related to
-
SERVER-31715 createIndexes (and dropIndexes) may not create index (or leave index around) if migration happens concurrently
- Closed
- related to
-
SERVER-31732 Recipient shards of migrations hold DBLock in X mode while creating indexes
- Closed