Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-95544

setFeatureCompatibilityVersion, createCollection and moveCollection could cause a 3-way deadlock in config shards

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 9.0 Required, 8.0.0, 8.0.1, 8.0.2
    • Component/s: Sharding
    • None
    • Catalog and Routing
    • ALL
    • Hide

      1. Apply the attached patch in revision
      2. Run the test with python buildscripts/resmoke.py run --suites=sharding jstests/sharding/reproduce_setFCV_moveColl_createColl_deadlock.js

      Beware that this is a deadlock, so, the test will not finish until after 5 minutes at least, which is when the LockBusy error will be thrown by the createCollection thread.

      Show
      1. Apply the attached patch in revision 2. Run the test with python buildscripts/resmoke.py run --suites=sharding jstests/sharding/reproduce_setFCV_moveColl_createColl_deadlock.js Beware that this is a deadlock, so, the test will not finish until after 5 minutes at least, which is when the LockBusy error will be thrown by the createCollection thread.
    • CAR Team 2024-10-28, CAR Team 2024-11-11, CAR Team 2024-11-25
    • 2

      Usually cluster DDL operations do FCV checks at the beginning of the command execution in order to determine if a different code path is required for newer versions. Create collection is an example of this, where depending on the feature flag enabled and version, we launch a coordinator or not (this was added as part of SERVER-81190). Usually the idea is to have these FCV checks before holding any DDL locks. 

      SERVER-81960 added a FCV region in the configsvrReshardCollection command in order to support the new moveCollection operation, and the current design of resharding requires the orchestration to happen in the config server, but only after the primary db shard is holding the necessary DDL lock to serialize with other cluster level DDL. So, a resharding operation goes first to the primary db shard, acquires the DDL lock for the collection and then it goes to the config server. 

      This is usually fine, but in config shards, all the resharding coordinators might end up instantiated in the same shard, and if a setFeatureCompatibility command sneaks in at the right time, we might end up with the following interleaving:

      t1: reshardCollection acquires a DDL lock when creating the db primary shard coordinator
      t2: createCollection instantiates a FCV region
      t2: Tries to acquire the DDL lock, but it ends up waiting for t1
      t3: setFeatureCompatibilityVersion tries to acquire an exclusive lock, but it ends up waiting for t2
      t1: In a remote request to itself, configsvrReshardCollection tries to instantiate a FCV region, but it ends up enqueuing the lock behind t3

      Causing a 3-way deadlock. For a customer, all DDL operations for the database and collection that was being moved and setFeatureCompatibilityVersion commands will block for 5 minutes until t2 fails to acquire the DDLLock with LockBusy, which would then destroy the FCVRegion, allowing t3 to finish and then t1. If there is any other operation trying to get a FCVRegion (like timeseries batch writes), they would also block until the cluster goes back to normality.

      One way of solving this, is moving out the FCV check from the configsvrReshardCollection command, and do it in the _shardsvrReshardCollection command, like all other cluster DDL does, another way is thinking really hard about how to make the FCVRegion in the create command not being held while the create is running, but we could still have a potentially dangerous situation if we leave the FCV region in the configsvrReshardCollection command. You can find a reproducible of this attached.

        1. deadlock_repro.js
          6 kB
          Marcos José Grillo Ramirez

            Assignee:
            enrico.golfieri@mongodb.com Enrico Golfieri
            Reporter:
            marcos.grillo@mongodb.com Marcos José Grillo Ramirez
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated: