Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-77018

Deadlock between dbStats and 2 index builds

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Critical - P2 Critical - P2
    • 7.1.0-rc0, 6.3.2, 6.0.7, 5.0.19, 7.0.0-rc2
    • Affects Version/s: 7.0.0-rc0, 6.3.1
    • Component/s: None
    • None
    • Storage Execution
    • Minor Change
    • ALL
    • v7.0, v6.3, v6.0, v5.0
    • Execution Team 2023-05-29

      If an on-going index build yields its locks after initiating a bulk insert (which is initialized here), it still holds onto the write lock on the index table at the WiredTiger level. If a dbStats command comes in, it will take collection level MODE_IS lock and attempt to acquire a read_lock for the ident the index build is currently writing to (but cannot since IndexBuild_1 holds the exclusive lock on that ident). (In (collection_impl.cpp) we iterate through the unfinished indexes and that is how we can see the in-progress index table).
        

      The problem arises when another operation comes in and prevents IndexBuild_1 from re-acquiring its lock, like another index build that enqueues a collection MODE_X lock. These events can produce a deadlock in the system represented by:

      dbStats IndexBuild_0 IndexBuild_1
      [Global, DB, Coll]- MODE_IS [Global, DB] - MODE_IX  [Global, DB, Coll] - MODE_IX 
          yields MDB level locks
      - holds write lock on table:index-X
      blocks IndexBuild_1
      - waiting on read lock of table:index-X
      - holds coll lock - MODE_IS 
         
       
      • waiting for MODE_X coll lock
       
         
      • waiting to reacquire locks

       

      Original explanation by Suganthi here

            Assignee:
            yujin.kang@mongodb.com Yujin Kang Park
            Reporter:
            fausto.leyva@mongodb.com Fausto Leyva (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            17 Start watching this issue

              Created:
              Updated:
              Resolved: