-
Type: Bug
-
Resolution: Works as Designed
-
Priority: Major - P3
-
None
-
Affects Version/s: 3.4.14, 3.6.4
-
Component/s: Replication
-
None
-
ALL
-
v3.6, v3.4
-
Repl 2018-05-21
The TopologyCoordinator uses _memberData.getLastAppliedOpTime() to advance the commit point on primaries. _memberData.getLastAppliedOpTime() returns _lastAppliedOpTime, which is set in advanceLastAppliedOpTime(). That is called from setUpValues which is called on heartbeat responses.
This is a problem because imagine if we have 3 nodes A, B, and C. A starts as the primary and commits OpTime(Timestamp(1,1), 1) to all nodes. A writes OpTime(Timestamp(2,1), 1) and it replicates to B, but A never receives the acknowledgement and never commits it. A also writes OpTime(Timestamp(3,1), 1). B then runs for election in term 2 and C votes for it since it's ahead. A then steps down and runs for election again in term 3. C votes for it and it wins. B then takes a write at OpTime(Timestamp(4,1), 2) and A takes a write at OpTime(Timestamp(5,1), 3). A then gets a heartbeat from B and hears that it is at OpTime(Timestamp(4,1), 2) and commits all operations less than that, including OpTime(Timestamp(3,1), 1), which is only on itself. If B then runs for election again in term 4, and C votes for it, then A can begin syncing from B and roll back it's majority committed write.
It's possible something will prevent the above from happening exactly as stated and it may be easier to reproduce in a 5 node set. That said, it is definitely a problem (and possible currently) for a node to commit operations on its branch of history based on oplog entries with higher optimes than the commit point, but lower terms than its current term (which would not cause a step down).
- related to
-
SERVER-27123 Only update commit point via spanning tree
- Closed
-
SERVER-29076 Replace all usage of heartbeat op times with lastAppliedOpTime / lastDurableOptime
- Backlog
-
SERVER-29079 Unify liveness information between spanning tree and heartbeat updates
- Backlog
-
SERVER-26990 Unify tracking of secondary state between replication and topology coordinators
- Closed
-
SERVER-29078 Eliminate use of memberHeartbeatData in replication_coordinator_impl
- Closed