Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Blocker - P1
Fix Version/s: WT10.0.1, 4.4.9, 5.0.3, 4.2.18, 5.1.0-rc0, 4.0.29
Affects Version/s: None
Component/s: None
Labels:
- dup-key

Case:

Story Points:
3
Sprint:
Storage - Ra 2021-08-23, Storage - Ra 2021-09-06
Backport Requested:

v5.0, v4.4, v4.2, v4.0

Issue Status as of Sept 22, 2021

ISSUE DESCRIPTION AND AFFECTED VERSIONS
This issue in MongoDB 4.4.8 causes a checkpoint thread to read and persist an incomplete version of data to disk. Data in memory remains correct unless the server crashes or experiences an unclean shutdown. Then, the inconsistent checkpoint is used for recovery and introduces corruption.

The bug is triggered on cache pages that receive an update during a running checkpoint and which are evicted during the checkpoint.

DIAGNOSIS AND IMPACT
MongoDB 4.4.8 is affected. The issue is fixed in version 4.4.9.

The bug can cause a Duplicate Key error on startup and prevent the node from starting.

The validate command reveals impact by reporting on the inconsistencies created between documents and indexes, in the form of:

extra index entries (including duplicate entries in unique indexes)
missing index entries

After an unclean shutdown, inconsistent writes can lead to the inability to restart an impacted node due to a Duplicate Key error during startup. However, nodes can also start successfully and still be impacted.

If a node starts successfully, it may still have been impacted by:

Data inconsistency within documents - specific field values may not correctly reflect writes that were acknowledged to the application prior to the unclean shutdown time.
Incomplete query results - lost or inaccurate index entries may cause incomplete query results for queries that use impacted indexes.
Missing documents - documents may be lost on impacted nodes.

REMEDIATION AND WORKAROUNDS
First, upgrade to a fixed version (MongoDB 4.4.9). Impact can be remediated on earlier versions but could re-occur.

Then, run the validate command on each collection on each node of your replica set.

If validate reports any failures, resync the impacted node from an unaffected node. If an unaffected node cannot be readily identified these scripts can assist the remediation of this bug.

Original description

I’ve been working backwards from checkpoint skipping a page it shouldn’t when running the test case in ~~WT-7958~~. Here is what I am seeing:

page P exists on disk with address A and is clean
checkpoint starts running
page P is modified, setting first_dirty_txn ahead of the checkpoint
eviction chooses P to evict (in some tree ahead of the checkpoint)
eviction reconciles P
the main part of reconciliation succeeds but __rec_hs_wrapup fails with EBUSY (there are various checks in __wt_hs_insert_updates when checkpoint_running == true, I’m not sure exactly which one is failing)
at this point, ref->addr == NULL && mod->rec_result == 0 and the block for A has been freed, the page is dirty but first_dirty_txn is ahead of the checkpoint
checkpoint skips writing P, and when it writes P’s parent, it considers P, sees the missing address and takes the WT_CHILD_IGNORE path — i.e., nothing is written and the original content of P (from step 1) is missing from the checkpoint

Note that nothing is lost in memory, so the next checkpoint (including a clean shutdown) will write P and fill in the hole.

It looks like reordering __rec_write_wrapup to call __rec_hs_wrapup before it clears out the address will fix this, I’m just checking if there are any problems with doing that.

is duplicated by

SERVER-60371 Fatal assertion - msgid 34437 - DuplicateKey

Closed

is related to

SERVER-60371 Fatal assertion - msgid 34437 - DuplicateKey

Closed

WT-7958 Include recovery in test/checkpoint

Closed

related to

WT-7958 Include recovery in test/checkpoint

Closed

Assignee:: Haribabu Kommi

Reporter:: Michael Cahill (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 26 Start watching this issue

Created:: Aug 23 2021 05:43:33 AM UTC

Updated:: Oct 29 2023 04:41:20 PM UTC

Resolved:: Aug 25 2021 01:35:45 AM UTC

Details

Description

Original description

Attachments

Issue Links

Activity

People

Dates