-
Type: Bug
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
This ticket is a follow-on for WT-8392. The signature of the failure is:
[2021/11/11 05:28:34.215] CONFIG: test_timestamp_abort -s -h WT_TEST.timestamp-abort -T 5 -t 10 [2021/11/11 05:28:34.215] Kill child [2021/11/11 05:28:34.215] Open database, run recovery and verify content [2021/11/11 05:28:34.215] Got stable_val 228976 [2021/11/11 05:28:34.215] records-1: LOCAL no record with key 1000024190 [2021/11/11 05:28:34.215] LOCAL: 1 record(s) absent from 117316
There have been two failures on ubuntu2004-small hosts, both in test_timestamp_abort -s (i.e. the stress variant) where the local record is absent from the local table after crash and recovery. This has never failed in reproduction attempts and only failed twice in several months.
The suspicion is that there is a file system bug. Both failures indicate that the local update completed, which means it wrote its insert into the WT log and that record would have been written to the OS buffer cache before returning. Then the application writes the record into its text file.
In WT-8392, debugging was added and turned on for stress runs to record pwrite operations and print the thread and key written. The output will show if pwrite succeeded and the offset/length of the record in the log file.
- related to
-
WT-8392 Add debugging to catch missing log record
- Closed