ISSUE SUMMARY
Under extremely rare circumstances, a race condition in the code that updates large records may cause some of those updates to be lost during an unclean shutdown.
On a production system, the path with the race condition is only taken when log records are 128k or larger. From MongoDB's perspective, it is a smaller record size, maybe 40k, since an individual WT log record contains the insert into collections, indexes, oplog, etc.
Attempts to trigger this race condtion with MongoDB using a synthetic workload with compression disabled have produced mixed results. However, attempts to reproduce this issue in MongoDB with default compression (snappy) have been unsuccessful.
This issue only affects users running with journaling enabled. Users that run with journaling disabled can not be affected by this bug.
USER IMPACT
If the race condition is triggered and the node suffers an unclean shutdown, some updates to large records since the last checkpoint may be lost. Unfortunately it is not possible to detect if the race condition has been triggered.
AFFECTED VERSIONS
MongoDB 3.2 versions up to and including MongoDB 3.2.7.
REMEDIATION
A fix for this issue is included in the MongoDB 3.2.8 production release. Users with workloads that include updates to large records whose nodes may be subject to unclean shutdowns should upgrade to MongoDB 3.2.8 to avoid exposure to this issue.
WORKAROUNDS
Unfortunately there are no known workarounds for this issue.
Original description
Hi!
After re-building WiredTiger with diagnostic enabled one of out test started to fail.
The test checks ability of DB to recover after application crash.
Please see attached minimized test:
$ ./recovery-test-mp 5 writer threads spawned killing child checking DB... no record with key 28363 no record with key 3689348814741930043 no record with key 3689348814741983775 no record with key 7378697629483839817 no record with key 7378697629483894622 no record with key 11068046444225735421 no record with key 14757395258967669044 no record with key 14757395258967726182 8 record(s) absent from total of 544769
I was unable to reproduce the problem without diagnostic enabled.
Thanks!
- related to
-
WT-2184 lost records after crash
- Closed