-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: 8.0.0-rc10
-
Component/s: Sharding
-
None
-
Catalog and Routing
-
Fully Compatible
-
ALL
-
v8.0
-
-
CAR Team 2024-07-08, CAR Team 2024-07-22
-
1
SERVER-89997 added a verification to moveCollection where timeseries movement is prevented. This check uses an FixedFCVRegion which holds the FCV lock in shared mode. However, in the same command, we execute a remote request while holding this lock, and this request also uses a FixedFCVRegion, opening the possibility of having a deadlock if a setVersionCompatibilityVersion command sneaks in.
The following scenario exemplifies such deadlock:
- Thread 0 receives a moveCollection (which under the hood is a _shardsvrReshardCollection with moveCollection provenance), holds the FCV lock in shared mode, and calls _shardsvrCreateCollection
- Thread 1 receives a setFeatureCompatibilityVersion with the kStart phase, which enqueues an exclusive lock when trying to change the FCV version to kDowngrading
- Thread 2 receives the _shardsvrCreateCollection, and tries to hold a FCV lock in shared mode
Causing the deadlock, Thread 2 will not acquire the shared lock even though is shared because Thread 1 enqueued an exclusive lock which is waiting for Thread 0, that will not release it's resources until Thread 2 is finished. You can find this scenario in the attached repro.
- is depended on by
-
SERVER-85646 Add testing coverage for movePrimary during upgrade/downgrade in v8.0
- Blocked
- is related to
-
SERVER-89997 Do not track unsharded timeseries collection if they can't be moved
- Closed