Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Critical - P2
Fix Version/s: 3.0.8, 3.2.0-rc4
Affects Version/s: 3.2.0-rc2
Component/s: Querying
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Completed:

3.0.8
Sprint:
Repl C (11/20/15), QuInt D (12/14/15)
Case:
Confidence Status:
None
Work Order:
0

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

Issue Status as of Dec 10, 2015

ISSUE SUMMARY
When using the WiredTiger storage engine, a race condition may prevent locally committed documents from being immediately visible to subsequent read operations. This bug may have an impact on both server and application operations. Unless exposed by a replication problem, it is not possible to determine if a system has been impacted by this bug without significant downtime.

USER IMPACT
Normally, after a write is committed by the storage engine, it is immediately visible to subsequent operations. A race condition in WiredTiger may prevent a write from becoming immediately visible to subsequent operations, which may result in various problems, primarily impacting replication:

User writes may not be immediately visible to subsequent read operations
Replica set members may diverge and contain different data
Replication thread(s) shut down server with error message “Fatal Assertion 16360”, due to duplicate _id values (a unique index violation)

Deployments where a WiredTiger node is or was used as a source of data may be affected. This includes:

replica sets where the primary node is or was running WiredTiger
replica sets using chained replication where any node may sync from a WiredTiger node

MMAPv1-only deployments are not affected by this issue. Mixed storage engine deployments are not affected when WiredTiger nodes never become primary, or when WiredTiger secondaries are not used as a source for chained replication.

WORKAROUNDS
There are no workarounds for this issue. All MongoDB 3.0 users running the WiredTiger storage engine should upgrade to MongoDB 3.0.8. A 3.0.8-rc0 release candidate containing the fix for this issue is available for download.

Users experiencing the "Fatal Assertion 16360" error may restart the affected node to fix the issue, but this condition may recur so upgrading to 3.0.8 is strongly recommended.

AFFECTED VERSIONS
MongoDB 3.0.0 through 3.0.7 using the WiredTiger storage engine. MongoDB 3.2.0 is not affected by this issue.

FIX VERSION
The fix is included in the 3.0.8 production release.

Original description

A new test is being introduced into the FSM tests to check the dbHash of the DB (and collections) on all replica set nodes, during these phases of the workload (~~SERVER-21115~~):

Workload completed, before invoking teardown
Workload completed, after invoking teardown

Before the dbHash is computed, cluster.awaitReplication() is invoked to ensure that all nodes in the replica set have caught up.

During the development of this test it was noticed that infrequent failures would occur for workload remove_and_bulk_insert, for wiredTiger storage.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

data_db_dbhash_0212513.tar.gz
71.11 MB
Nov 18 2015 03:52:16 PM UTC
data_db_dbhash_20151118161433.tar.gz
49.40 MB
Nov 18 2015 11:51:23 PM UTC
dbhash-remove_and_bulk_insert.js
5 kB
Nov 10 2015 04:15:57 PM UTC
dbhash-remove_and_bulk_insert.js
4 kB
Nov 09 2015 07:54:09 PM UTC
dbhash-remove_and_bulk_insert.js
4 kB
Nov 08 2015 12:08:15 PM UTC
dbhash-remove_and_bulk_insert.js
4 kB
Nov 06 2015 03:12:30 PM UTC
dbhash-remove_and_bulk_insert-wiredTiger-20151118101103.log.gz
12 kB
Nov 18 2015 03:54:26 PM UTC
dbhash-remove_and_bulk_insert-wiredTiger-20151118161433.log.gz
466 kB
Nov 18 2015 11:51:23 PM UTC
rbi-96.log
55 kB
Nov 05 2015 02:54:32 PM UTC
run_dbhash.sh
3 kB
Nov 10 2015 04:15:57 PM UTC
run_dbhash.sh
3 kB
Nov 09 2015 07:54:09 PM UTC
run_dbhash.sh
1 kB
Nov 05 2015 09:28:34 PM UTC

duplicates

WT-2237 Make committed changes visible immediately

Closed

is depended on by

SERVER-21115 Add dbHash checking to concurrency suite

Closed

is duplicated by

SERVER-21778 slave node crash: writer worker caught exception: E11000 duplicate key error

Closed

is related to

SERVER-21237 ReplSetTest.prototype.awaitReplication reads directly from the oplog collection causing false positives

Closed

SERVER-21645 WiredTigerRecordStore::temp_cappedTruncateAfter should set _oplog_highestSeen

Closed

related to

SERVER-21847 log range of operations read from sync source during replication

Closed

(1 related to)

Assignee:: Mathias Stearn

Reporter:: Jonathan Abrahams (Inactive)

Participants:: Benety Goh, Daniel Pasette, ITWEBTF SAXOBANK, Jonathan Abrahams, Mathias Stearn

Votes:: 0 Vote for this issue

Watchers:: 43 Start watching this issue

Created:: Nov 03 2015 10:06:21 PM UTC

Updated:: Sep 20 2017 07:18:12 PM UTC

Resolved:: Dec 01 2015 10:49:27 PM UTC

Confidence Status Last Update:: 23/Nov/15 7:28 PM

Details

Description

Original description

Attachments

Attachments

Issue Links

Activity

People

Dates