Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-91997

moveCollection could cause a deadlock with concurrent setFeatureCompatibilityVersion command

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 8.1.0-rc0, 8.0.0-rc13
    • Affects Version/s: 8.0.0-rc10
    • Component/s: Sharding
    • None
    • Catalog and Routing
    • Fully Compatible
    • ALL
    • v8.0
    • Hide

      1. Apply the attached patch and run the test in version 5c6bf7ad5

      Show
      1. Apply the attached patch and run the test in version 5c6bf7ad5
    • CAR Team 2024-07-08, CAR Team 2024-07-22
    • 1

      SERVER-89997 added a verification to moveCollection where timeseries movement is prevented. This check uses an FixedFCVRegion which holds the FCV lock in shared mode. However, in the same command, we execute a remote request while holding this lock, and this request also uses a FixedFCVRegion, opening the possibility of having a deadlock if a setVersionCompatibilityVersion command sneaks in.

      The following scenario exemplifies such deadlock:

      1. Thread 0 receives a moveCollection (which under the hood is a _shardsvrReshardCollection with moveCollection provenance), holds the FCV lock in shared mode, and calls _shardsvrCreateCollection
      2. Thread 1 receives a setFeatureCompatibilityVersion with the kStart phase, which enqueues an exclusive lock when trying to change the FCV version to kDowngrading
      3. Thread 2 receives the _shardsvrCreateCollection, and tries to hold a FCV lock in shared mode

      Causing the deadlock, Thread 2 will not acquire the shared lock even though is shared because Thread 1 enqueued an exclusive lock which is waiting for Thread 0, that will not release it's resources until Thread 2 is finished. You can find this scenario in the attached repro.

            Assignee:
            marcos.grillo@mongodb.com Marcos José Grillo Ramirez
            Reporter:
            marcos.grillo@mongodb.com Marcos José Grillo Ramirez
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: