Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-92463

Remove the post-commit user collection's lock acquisition upon writes to config.rangeDeletions

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 7.0.0, 8.0.0-rc0, 8.1.0-rc0
    • Component/s: None
    • None
    • Catalog and Routing
    • ALL
    • 200
    • 2

      As part of BF-34016 we discovered that taking a lock on the range deletion's op observer can cause a deadlock in case an update on the rangeDeletions collection is coming from a migration being aborted as part of the migration recovery procedure

      The confirmed sequence of the deadlock is:

      1. An uncommitted multi-document transaction holds the user collection lock in the MODE_IX on shard 0
      2. Resharding enqueues a MODE_X lock request for the user collection's database on shard 0 (behaviour described in SERVER-86727). This lock request enqueues behind the multi-doc transaction at (1)
      3. On shard 1 Shard 1’s “RecoverRefreshThread” sends a config.rangeDeletions update to shard 0 to recover its chunk migration abort decision. The op observer reacts to the write by requesting a MODE_IS lock against the user collection, which enqueues behind (2).

      The uncommitted multi-doc transaction at (1) waits for shard 1 to complete its recovery to send a statement and commit. However, shard 1 waits for shard 0 to complete the update at (3) to complete its recovery; the update at (3) cannot complete because of (2) which in turn depends on the uncommitted transaction at (1). The cycle is 1 -> 3 -> 2 -> 1.

      The goal of the ticket is to understand if we can access the CSS without holding a MODE_IS lock.

            Assignee:
            Unassigned Unassigned
            Reporter:
            enrico.golfieri@mongodb.com Enrico Golfieri
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: