-
Type: Task
-
Resolution: Won't Fix
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Serverless
The `ShardSplitOpObserver` removes access blockers when the state document is removed due to the ttl index (`ShardSplitDonorOpObserver::onDelete`). However it does not check if the access blocker is currently "used" by another shard split operation for the same tenant. Therefore we can have a race condition where a previous aborted shard split removes blocker for `tenant1` that is used by a currently ongoing shard split.
Scenario :
- commitShardSplit started for tenant1 for UUID 1
- commitShardSplit fails and the document becomes "aborted"
- forgetShardSplit called for UUID 1, ttl index activated
- commitShardSplit started for tenant1 for UUID 2
- ttl index removes state document for commitShardSplit UUID 1. It also removes the access blocker for tenant1 in the same operation.
- commitShardSplit UUID 2 crashes due to an invariant failure (or other UB behavior) as it expects to have an access blocker.
This leads to a crash, but it can also lead to data inconsistency before the crash happens (writes succeed when they shouldn't as the blocker as been removed).
- is related to
-
SERVER-61717 Ensure a POS instance remains in the POS map until the instance's run() is complete
- Open
- related to
-
SERVER-65236 Make tenant migration donor delete its state doc in its run method
- Closed