-
Type: Bug
-
Resolution: Gone away
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Sharding EMEA
-
ALL
-
135
ShardsvrCheckMetadataConsistencyParticipantCommand currently takes a DB lock in S mode IS mode without exempting taking the RSTL. This means that it will not be killed on stepdown (since it didn't take the global lock in a mode that conflicts with writes).
(Edit: at the time that this deadlock was found, the command took the DB lock in S mode).
This can then cause a deadlock with prepared transactions if the transaction is holding the DB lock that checkMetadataConsistency is looking to acquire, but committing the transaction is blocked on a stepdown (as in the node isn't able to replicate the commitTransaction command until it completes stepping down).
The order of events is:
1. Prepare a transaction that holds the DB lock in IX for some db that checkMetadataConsistency might need to take a DB lock for
2. ShardsvrCheckMetadataConsistencyParticipantCommand tries to take the DB lock for the db mentioned above, ends up holding the RSTL in IX mode while it waits
3. Node tries to step down before it receives the commitTransaction command
A targeted way to fix this would be to manually ensure that checkMetadataConsistency is killed by the stepdown thread or make sure it does not hold the RSTL.
- is related to
-
SERVER-72895 Implement shardKey index check in checkConsistencyMetadata command
- Closed
-
SERVER-74667 Use lock-free read approch for checkMetadataConsistency command
- Closed
- related to
-
SERVER-75288 Investigate whether the stepdown killop thread should kill operations that hold the RSTL
- Open