-
Type: Bug
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: 7.0.0, 8.0.0-rc0, 8.1.0-rc0
-
Component/s: None
-
None
-
Catalog and Routing
-
ALL
-
200
-
2
As part of BF-34016 we discovered that taking a lock on the range deletion's op observer can cause a deadlock in case an update on the rangeDeletions collection is coming from a migration being aborted as part of the migration recovery procedure.
The confirmed sequence of the deadlock is:
- An uncommitted multi-document transaction holds the user collection lock in the MODE_IX on shard 0
- Resharding enqueues a MODE_X lock request for the user collection's database on shard 0 (behaviour described in
SERVER-86727). This lock request enqueues behind the multi-doc transaction at (1) - On shard 1 Shard 1’s “RecoverRefreshThread” sends a config.rangeDeletions update to shard 0 to recover its chunk migration abort decision. The op observer reacts to the write by requesting a MODE_IS lock against the user collection, which enqueues behind (2).
The uncommitted multi-doc transaction at (1) waits for shard 1 to complete its recovery to send a statement and commit. However, shard 1 waits for shard 0 to complete the update at (3) to complete its recovery; the update at (3) cannot complete because of (2) which in turn depends on the uncommitted transaction at (1). The cycle is 1 -> 3 -> 2 -> 1.
The goal of the ticket is to understand if we can access the CSS without holding a MODE_IS lock.