-
Type: Improvement
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: 3.0.5
-
Component/s: WiredTiger
-
Minor Change
-
v4.0
-
Storage NYC 2018-06-18, Storage NYC 2018-09-10, Storage NYC 2018-09-24
-
(copied to CRM)
ISSUE DESCRIPTION AND IMPACT
The mongod --repair option was originally introduced for use with the MMAP storage engine; when it is used with WiredTiger, attempts to recover a corrupted dbpath via mongod --repair may fail under a number of specific scenarios.
Enhanced repair functionality allows mongod --repair to successfully recover from a wider variety of faulty conditions that previously would have resulted in a repair failure. It’s important to note that these changes do not allow the mongod to recover otherwise unretrievable data; instead, they ensure that the data set is returned to a working state with as much data as the process was able to salvage.
In addition to a more robust repair mechanism, this change adds the following new behavior:
- If the repair operation modifies data for a node in a replica set, it will not be able to rejoin the replica set until it has been fully resynced. This behavior is designed to prevent an instance where a node with only partial data recovered via mongod --repair could potentially become a replica set primary, as this would result in data effectively going missing.
- If a repair operation fails for any reason, the node will not be able to start up again without the mongod --repair option. This precaution is included to prevent instances where the mongod is repeatedly restarted with a broken data set, potentially resulting in additional data corruption.
DIAGNOSIS AND AFFECTED VERSIONS
This issue is exhibited whenever a mongod --repair command fails to start the mongod and instead returns an error message. There are several error messages than can be returned - some of the most common:
Fatal Assertion 28558 at src\mongo\db\storage\wiredtiger\wiredtiger_util.cpp
WiredTiger.wt: encountered an illegal file format or internal value
While these are only some of the most common, most mongod --repair operations that fail to boot the mongod exhibit this issue.
This issue affects MongoDB versions 3.0 - 4.0.2 that use the WiredTiger storage engine.
REMEDIATION AND WORKAROUNDS
Currently, the only workaround available is to resync from a healthy node in a replica set, restore the dbpath from an earlier backup, or open a SERVER project ticket to request a manual repair attempt of the WiredTiger metadata files.
FIX VERSIONS
This issue is fixed in MongoDB 4.0.3 as well as in 4.1.4, and will be available in the 4.2 production release.
Original description
The repair loop should be more forgiving about failures such as missing files and deal with collections or indexes missing from the catalog with a big warning message.
- is duplicated by
-
SERVER-23532 WT Library Panic
- Closed
-
SERVER-22816 Corrupt metadata after unexpected shutdown -> unable to start or repair
- Closed
-
SERVER-32451 Cannot start mongod with a missing wiredTiger database
- Closed
-
TOOLS-1496 provide tool to repair corruputed database
- Closed
-
SERVER-29555 Make repair more robust, or optionally error tolerant
- Closed
-
SERVER-29557 Allow healthy databases to skip repairs
- Closed
- is related to
-
SERVER-18640 Wiredtiger does not recover from unclean shutdown
- Closed
-
SERVER-26924 Cannot start or --repair mongod because of unclean shutdown (due to running out of disk space)
- Closed
-
SERVER-36633 Use WiredTiger log file salvage to recover a corrupted journal
- Closed