-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Fully Compatible
-
ALL
-
Execution Team 2021-01-11
-
22
dbHash is allowed to hold open storage snapshots indefinitely while waiting for collection locks. Multi-doc transactions do this, but they have lock acquisition deadlines.
This behavior introduces a very specific live lock in 4.4:
- dbHash opens a read snapshot using the current cluster time
- An index build aborts. While holding an X collection lock, the index build attempts to set a ghost commit timestamp for the catalog write using the same cluster time.
- Due to an assertion in WT, setting the ghost timestamp will fail because there is an open transaction (dbHash) reading at the same timestamp.
- The index build retries indefinitely, waiting for the dbHash reader to finish.
- The dbHash operation is unable to make progress because it is blocked by the X lock.
In general, I believe we should impose a lock timeout such that dbHash cannot hold open snapshots and block indefinitely, much like we already do for multi-document transactions.
The alternative to imposing a lock deadline would be to fix ghost timestamps, but only in 4.4. I believe a dbHash change will avoid the risk of modifying 4.4 index build code that has been removed in master. Adding a lock timeout to dbHash assumes there are no other consequences of the index build ghost timestamping behavior. This same bug applies to background validation, which will need to undergo the same lock timeout change (SERVER-53445).
- is related to
-
SERVER-53445 [4.4] impose lock acquisition timeout for background validation
- Closed
-
SERVER-57192 [4.4] Lower dbHash and background validation lock acquisition timeouts
- Closed
- related to
-
SERVER-58969 [4.4] Lower dbHash and background validation lock acquisition timeouts
- Closed