Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Blocker - P1
Fix Version/s: 7.0.0-rc0, 4.4.20, 5.0.16, 6.0.6, 6.3.0-rc3
Affects Version/s: 6.0.0, 4.4.15, 5.0.10, 6.3.0-rc2
Component/s: None
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v6.3
Sprint:
Execution Team 2023-04-03
Case:
Confidence Status:
None
Work Order:
0

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

After yielding, operations will restore their lock state via restoreLockState. This function will iterate over each lock that was previously held and try to reacquire it in sorted order. However, we don't actually try to reacquire the FCV lock, which should be reacquired after the PBWM. When we go to try to reacquire the RSTL, we fail the check since the lock in question is actually the FCV lock (but we never checked for it). We will then acquire the global lock (including a acquiring read ticket) without having the FCV lock or the RSTL.

Once that is done, we will reacquire all the other locks we held, which in this case includes the RSTL (but now out of order).

When the stepdown thread starts, it enqueues the RSTL in X mode, which jumps to the top of the queue. At the same time, there will operations that are holding the RSTL in IX mode, but are waiting to acquire read tickets, which is preventing the stepdown thread from proceeding. If we have exhausted all read tickets in the system, then these threads are stuck waiting while holding the RSTL but the threads holding the read tickets cannot progress since they are stuck behind the stepdown thread waiting for the RSTL.

There is also a variation of this that can happen on step up when we are holding the RSTL and waiting on ticket acquisition.

We should be accounting for the FCV lock when we restore locks.

is caused by

SERVER-65821 Deadlock during setFCV when there are prepared transactions that have not persisted commit/abort decision

Closed

related to

SERVER-84353 The test for stepDown deadlock with read ticket exhaustion is flaky

Closed

SERVER-75262 Add a passthrough test that exercises ticket exhaustion

Closed

Assignee:: Matt Kneiser
Reporter:: Samyukta Lanka
Participants:: Githook User, Matt Kneiser, Samyukta Lanka
Votes:: 0 Vote for this issue
Watchers:: 45 Start watching this issue

Created:: Mar 23 2023 07:55:44 PM UTC
Updated:: May 14 2025 07:12:02 AM UTC
Resolved:: Mar 29 2023 06:18:06 PM UTC
Confidence Status Last Update:: 24/Mar/23 2:12 PM

Details

Description

Attachments

Issue Links

Activity

People

Dates