ISSUE DESCRIPTION AND IMPACT
This issue causes incorrect checkpoint metadata to sometimes be recorded by MongoDB versions 4.4.3 and 4.4.4. Starting in versions 4.4.8+ and 5.0.2+ WiredTiger uses that incorrect metadata at start up, which can lead to data corruption.
Upgrading directly to any MongoDB version 4.4.8+ or 5.0.2+ from MongoDB versions 4.4.3 and 4.4.4 can leave data in an inconsistent state. This ticket currently tracks the implementation of a safe, direct upgrade path to a future version of MongoDB, and this fix is included starting in MongoDB versions 4.4.11 and 5.0.6.
DIAGNOSIS
This issue can cause a Duplicate Key error on startup that prevents the node from starting.
However, nodes can also start successfully and still be impacted. If a node starts successfully, it may still have been impacted by:
- Data inconsistency within documents - specific field values may not correctly reflect writes that were acknowledged to the application prior to the shutdown time. And, documents may still exist which should have been deleted.
- Incomplete query results - lost or inaccurate index entries may cause incomplete query results for queries that use impacted indexes.
- Missing documents - documents may be lost on impacted nodes.
Impact on a node that starts successfully can be checked by running the validate command. The output from validate reveals the impact by reporting on inconsistencies found between documents and indexes in the form of:
- Extra index entries (including duplicate entries in unique indexes)
- Missing index entries
REMEDIATION AND WORKAROUNDS
For clusters still on versions 4.4.3 and 4.4.4: it is possible to avoid this issue by upgrading directly to 4.4.11+ or 5.0.6+.
Reference the following list to consider our recommended response to this issue:
- Clusters on versions 4.4.0, 4.4.1, and 4.4.2 are safe to upgrade to 4.4.8+ or 5.0.2+ but should upgrade to recommended versions 4.4.10+ or 5.0.4+.
- Clusters on versions 4.4.3 or 4.4.4 should upgrade directly to versions 4.4.11+ or 5.0.6+.
- Clusters running versions 4.4.5-4.4.7 can and should upgrade to 4.4.10+ or 5.0.4+.
Be aware that WT-7995 affects versions 4.4.2-4.4.8 and requires its own remediation.
For clusters that have already upgraded to 4.4.8+ from versions 4.4.3 and 4.4.4:
- If you previously followed remediation steps for
WT-7995and detected corruption, you will have remediated any corruption that occurred as part of this bug. - If you have not validated all collections since upgrading to 4.4.8+ from 4.4.3 or 4.4.4, we recommend validating all collections.
If corruption is detected, data can be recovered from other nodes in the replica set. This may be operationally intensive. See [these scripts| If an unaffected node cannot be readily identified these scripts can assist the remediation of this bug.] for assistance. Please use these scripts with care, and consult the README thoroughly before use.
- causes
-
WT-8534 Allow retrieving checkpoint snapshot for backup restore recovery
- Closed
- is related to
-
SERVER-79265 Update MongoDB-WiredTiger-Log version numbers chart
- Closed
- related to
-
WT-7995 Fix the global visibility so that it cannot go beyond checkpoint visibility
- Closed
-
WT-6671 Save the checkpoint snapshot that is used to take checkpoint in the metadata
- Closed
-
WT-6673 RTS fix inconsistent checkpoint by removing updates outside of the checkpoint snapshot
- Closed
-
WT-8597 Understand why version 4.4.2 of MongoDB isn't susceptible to data loss on upgrade
- Closed
-
WT-7784 Enable RTS to use checkpoint snapshot on timestamp tables
- Closed