-
Type: Bug
-
Resolution: Gone away
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
ALL
-
After SERVER-66972 (which delegates the refresh of the database version to another thread), the JS test retryable_findAndModify_commit_and_abort_prepared_txns_after_failover_and_restart.js started triggering a deadlock with the prepared transactions after a node step down. The function that refreshes the local database metadata tries to X-lock on the database, but it waits indefinitely because a prepare transaction acquired and keeps the IX-lock on the same database.
Unfortunately, this is actually blocking the fix for SERVER-66972 (Database critical section does not serialize with ongoing refreshes).
In details, the JS test runs the command below which never ends:
{ "abortTransaction" : 1, "lsid" : { "id" : UUID("372cc71c-e185-409d-ad2e-465e053764a4"), "txnNumber" : NumberLong(35), "txnUUID" : UUID("e8666f6d-e097-4ccd-ad30-608b7494aebc") }, "txnNumber" : NumberLong(0), "autocommit" : false }
From the dump of the lock manager I see that the RefreshDbVersionThread is waiting to actually X-lock the database, while the resource is already IX-locked by another tread working on the transaction e8666f6d-e097-4ccd-ad30-608b7494aebc:
{ "lockAddr":"0x7f21e5c89b20", "resourceId":"{6237343057549539649: Database, 1625657039122151745, config}", "granted":[ { "lockRequest":"0x7aa", "lockRequestAddr":"0x7f21e5cb6020", "thread":"thread::id of a non-executing thread", "mode":"IX", "convertMode":"NONE", "enqueueAtFront":false, "compatibleFirst":false, "debugInfo":"lsid: { id: UUID(\"372cc71c-e185-409d-ad2e-465e053764a4\"), uid: BinData(0, E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855), txnNumber: 35, txnUUID: UUID(\"e8666f6d-e097-4ccd-ad30-608b7494aebc\") }" } ], "pending":[ { "lockRequest":"0x911", "lockRequestAddr":"0x7f21ec477820", "thread":"139783425423104", "mode":"X", "convertMode":"NONE", "enqueueAtFront":false, "compatibleFirst":false, "debugInfo":"", "clientInfo":{ "desc":"RefreshDbVersionThread", "opid":2311 } }, ... ] }
Additional details:
This problem looks similar (but not the same) as SERVER-62951.
- is related to
-
SERVER-66972 Database critical section does not serialize with ongoing refreshes
- Closed
- related to
-
SERVER-69108 SCCL can immediately return config and admin metadata without triggering a refresh
- Closed