-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: 5.1.0, 4.4.6, 5.0.0-rc8
-
Component/s: Storage
-
None
-
Fully Compatible
-
ALL
-
v5.1, v5.0, v4.4
-
Execution Team 2021-08-09, Execution Team 2021-08-23, Execution Team 2021-09-06, Execution Team 2021-09-20, Execution Team 2021-10-18, Execution Team 2021-11-01, Execution Team 2021-11-15
-
135
RecordIds by MDB are generated with an auto-incrementing integer. Initialization on a restart, or after rollback opens a cursor (with data at the stable timestamp) and does one reverse step to get the largest id in use for a document.
Now consider the following sequence with durable history:
- Set OldestTimestamp 1
- Insert RecordId(1) -> A at TimeStamp(10)
- Insert RID(2) -> B at TS(20)
- Delete RID(2) (B) at TS(30)
If we were to restart and initialize the next record id, we'd start issuing new documents RID(2). Normally this is fine. Any new replicated user writes must be generated with a timestamp larger than 30, so the update chain for RID(2) will remain valid.
However, when reconstructing prepared transactions, the prepare timestamp (and thus any following commit timestamp, but not the durable timestamp) may be arbitrarily old.
In this example, after initializing the next RID to 2, if we were to reconstruct a prepared transaction from TS(10) that performs an insert on this collection, we'd get the following update chain (from oldest to newest):
RID(2) => B @ TS(20) -> <tombstone> @ TS(30) -> PreparedInsert @ TS(10)
Committing the prepared insert at a value between 10 and 30 results in wrong results/inconsistent data when reading at those timestamps.
- causes
-
SERVER-62650 RecordStore RecordId initialization can deadlock transactions with cache eviction
- Closed
- depends on
-
WT-8241 Skip value return for largest key
- Closed
-
WT-8226 Fix largest_key failed to consider prepared update
- Closed
-
WT-7918 Allow setting the prepare timestamp smaller than or equal to the latest active read timestamp with roundup prepare config
- Closed
-
WT-7992 Provide API to return the last key in a table regardless of visibility
- Closed
- is depended on by
-
WT-8114 Revert allow setting the prepare timestamp smaller than or equal to the latest active read timestamp with roundup prepare config
- Closed
- related to
-
WT-7783 Fix RTS to restore tombstone when an on-disk update is out of order prepare update
- Closed
-
WT-7815 Properly initialize prev_upd_ts for ordered timestamp assertion
- Closed
-
WT-7820 Retrieve the on-disk durable timestamp to compare with newer update timestamp
- Closed