-
Type: Improvement
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Replication, Storage
-
Fully Compatible
-
Repl 2018-11-19, Repl 2018-12-03, Repl 2018-12-17, Repl 2019-01-14
If the replication majority commit point is far enough behind the primary that the oplog entry on the primary that corresponds to the majority point falls to the back of its oplog and gets deleted, this can cause a majority of the secondaries to become too stale to continue replication, and thus require a full resync. Having a majority of a set's nodes being in the process of initial sync would mean that there's no healthy majority to elect a primary, thus the set would then have a prolonged period of no write availability.
One possible mitigation for this problem is to prevent the primary from deleting ops from its oplog that are ahead or equal to the replication commit point, so that there will always be a common point between the oplogs of the majority of the secondaries and the primary.
- is duplicated by
-
SERVER-29215 Coordinate oplog truncate point with checkpoint timestamp
- Closed
- is related to
-
SERVER-22766 Dynamic oplog sizing for WiredTiger nodes
- Closed
-
SERVER-36494 Prevent oplog truncation of oplog entries needed for startup recovery
- Closed
- related to
-
SERVER-29125 Add $changeNotification stage that always outputs the single last oplog entry, unmodified
- Closed