-
Type: Bug
-
Resolution: Fixed
-
Priority: Critical - P2
-
Affects Version/s: None
-
Component/s: None
-
13
-
Storage Engines 2018-10-08, Storage Engines 2018-10-22, Storage Engines 2018-11-05, Storage Engines 2018-11-19, Storage Engines 2018-12-03, Storage Engines 2018-12-17, Storage Engines 2018-12-31, Storage Engines 2019-01-28, Storage Engines 2019-02-11, Storage Engines 2019-02-25, Storage Engines 2019-03-11, Storage Engines 2019-03-25
-
v4.0, v3.6
ISSUE SUMMARY
Sometimes eviction will choose versions of values to write to data files that are in the future of what a checkpoint can choose (this is termed skew newest eviction). If that happens, it's necessary for the checkpoint to revisit all of those pages and write the expected versions.
The issue was introduced in WT-4094.
USER IMPACT
This problem means that checkpoints could contain inconsistent content. It is only possible if cache overflow (lookaside) is in use. Any checkpoint created while the lookaside file is being used could suffer from this issue, which can result in data loss in the following conditions:
1. If data files were copied from a live system, or
2. Restart after a shutdown.
WORKAROUNDS
There is currently no workaround for this issue.
AFFECTED VERSIONS
MongoDB 3.6.6+, 4.0.0+
FIX VERSION
MongoDB 3.6.12, MongoDB 4.0.9
RESOLUTION DETAILS
There were cases where the transaction ID associated with such a reconciliation was being set in a way that allowed a checkpoint to skip those pages. The result of which was that a checkpoint could be created with invalid content.
Failure conditions
The following events need to happen for this failure to occur:
1. Checkpoint starts.
2. A transaction T transitions the page from clean to dirty.
3. A page is evicted to lookaside in "skew newest" mode.
4. The checkpoint does not rewrite the page.
At this point, the checkpoint on disk is inconsistent because it contains part of transaction T but not all of it.
The next checkpoint or a clean shutdown would normally rewrite this page. However due to an optimization, if the stable timestamp has not changed, these checkpoints may be skipped.
Test failure description
A data mismatch (between a COL table and a ROW table) was detected on wiredtiger-test-checkpoint job on 'kodkod'. A similar failure was reported in WT-4244 (which was closed as a duplicate of WT-4239).
http://build.wiredtiger.com:8080/job/wiredtiger-test-checkpoint/3785/
+ nice ./test/checkpoint/t -t m -n 1000000 -k 5000000 -C cache_size=100MB t: process 11308 1: 1 workers, 3 tables checkpointer thread starting: tid: 11308:0x7f3af6786700 worker thread starting: tid: 11308:0x7f3aef7fe700 Finished a checkpoint Finished verifying a checkpoint with 3 tables and 0 keys Finished a checkpoint Finished verifying a checkpoint with 3 tables and 87 keys Finished a checkpoint ... Finished verifying a checkpoint with 3 tables and 443504 keys t: 1st cursor didn't find 2nd key: WT_NOTFOUND: item not found t: verify_checkpoint - mismatching data: Bad address Finished a checkpoint ... Finished verifying a checkpoint with 3 tables and 470602 keys Finished a checkpoint Key/value mismatch: 657681/0000000000000000000000000000000375492 from a COL table is not 657675/0000000000000000000000000000000497345 from a ROW table Ran workers for: 198.422610 seconds Closing connection + cleanup + status=14
- is caused by
-
WT-4094 Enable lookaside eviction while checkpoints are running
- Closed