Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Critical - P2
Fix Version/s: 3.6.5, 3.7.6
Affects Version/s: None
Component/s: Replication, Write Ops
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v3.6
Sprint:
Repl 2017-10-02, Repl 2017-10-23, Repl 2017-11-13, Repl 2017-12-04, Repl 2017-12-18, Query 2018-03-12, Query 2018-03-26, Query 2018-04-09, Query 2018-04-23
Linked BF Score:
68
Confidence Status:
None
Work Order:
0

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

Consider the following sequence of events during an batch insert of 1000 documents with ordered:true and w:majority writeConcern.

Insert 500 documents and unlock
Pause the inserting thread
Another node steps up and the original primary rolls back the 500 writes already done
The original primary steps back up
The inserting thread then does the remaining writes which get new optimes
That thread then waits for majority confirmation of the last writes, and successfully returns to the user

In this case we've lost 500 writes that are w:majority confirmed, and we've written later ops without the earlier ops even with ordered:true. This is caused by a combination of not killing all ops (at least all writing ops) on all replSet stepdown paths, not closing all connections, and always asking "can I currently write to this namespace" rather than "have I always been able to write to this namespace since starting this op".

This issue also effects any operations that write multiple oplog entries with a release of the global lock in between, and "no-op" ops that get the last optime after releasing the global lock. A non-exhaustive list:

All batch write operations (insert, update, delete)
Multi-update and Multi-delete
Agg with $out
MapReduce

Potential solutions:

Fail all write ops and waitForWriteConcern if the electionId (or rbid) changed since the op began
Interrupt all write ops (or all ops) on all stepdown paths. Also need to either:
a) Ensure all write ops check for interrupt every time they aquire the global lock after acquiring it (currently they check first)
b) Make all lock acquisitions checkForInterrupt (this is planned already to support interruptable locking)
Record the term at the beginning of every operation, in the logOp (and awaitReplication) code check that the term of the write matches what was recorded and abort the write if not.

causes

SERVER-34682 Old primary should vote yes and store the last vote after stepdown on learning of a higher term

Closed

is related to

SERVER-38354 Allow shutdown error when reading last applied optime on startup

Closed

SERVER-31277 Cancel all user operations on heartbeat stepdown path

Closed

SERVER-27545 Include RBID in replSet metadata of command replies

Closed

related to

SERVER-34672 Unable to add shard on 3.7.5 sharded cluster with mmapv1 shard

Closed

SERVER-37574 Force reconfig should kill user operations

Closed

SERVER-68874 Consider making waitAfterPinningCursorBeforeGetMoreBatch only hang instead of also fiddling with locks (while-loop taking and releasing locks)

Closed

SERVER-37381 Allow prepared transactions to survive state transitions

Closed

(3 related to)

Assignee:: Justin Seyster
Reporter:: Mathias Stearn
Participants:: Andy Schwerin, David Storch, Eric Milkie, Geert Bosch, Githook User, Justin Seyster, Mathias Stearn, Spencer Brody, Tess Avitabile
Votes:: 0 Vote for this issue
Watchers:: 22 Start watching this issue

Created:: Dec 28 2016 06:28:42 PM UTC
Updated:: Aug 16 2022 08:01:41 PM UTC
Resolved:: Apr 19 2018 02:55:02 AM UTC
Confidence Status Last Update:: 27/Feb/18 5:08 PM

Details

Description

Attachments

Issue Links

Activity

People

Dates