-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Replication
-
Fully Compatible
-
ALL
-
Repl 2018-04-23
Consider the following scenario:
1. Clean shutdown a 3.6-binary. It's appliedThrough value will be null.
2. Bring the node up with a 4.0 binary. Replication recovery will do nothing since we are consistent at the top of the oplog; there is no appliedThrough or recoveryTimestamp.
3. The node starts taking writes from ts=T1 to ts=T2, as a primary or secondary. These writes get written to the oplog, but only the oplog writes get journaled. The appliedThrough may move forward if it's a secondary, but those writes will also not be journaled.
4. Now, before we take a stable checkpoint, the node crashes.
5. Restart the 4.0 binary node. The node starts up with the same data as at step 2 (reflecting a consistent point at T1), but also with the oplog entries through T2 from step 3.
6. There is no recoveryTimestamp and the appliedThrough will be null, so we assume we're consistent at the top of the oplog, T2, when in reality we are consistent at T1. We then do not replay T1->T2.
- is depended on by
-
WT-3959 Recovery timestamp set on restart scenarios need addressing
- Closed
- is duplicated by
-
SERVER-34350 Properly initialize initial data timestamp
- Closed
- related to
-
SERVER-34716 Add unittest of lastStableCheckpointTimestamp field in ReplSetGetStatus
- Closed