-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Replication
-
Minor Change
-
ALL
-
v8.0
-
Repl 2024-07-08
If a node that was repaired via --repair rejoins a replica set that it was a part of (like in this test), it will go through initial sync. This is because we run repair in cases of potential data loss or corruption, and we don't want the repaired node becoming primary and potentially causing data loss across the replica set.
When the repaired node goes through initial sync, it will have its stable timestamp set to a non-zero value, which is not expected for initial sync. Initial sync does not set the stable timestamp at all, but it does set the oldest timestamp as it applies oplog entries during its oplog application phase. This means that the oldest timestamp is likely to advance past the stable timestamp, and can cause us to take a stable checkpoint in that state when initial sync completes, which can cause unsafe and undefined behavior in the storage engine (see SERVER-84706 for more details).
EDIT: This will result in the server hitting an invariant when we attempt to set the oldest timestamp past the stable timestamp, since WT does not allow that.
Since the repaired node is going to go through initial sync anyways, we should require that the node be wiped beforehand. One option is that we require that the stable timestamp be null for all nodes here before starting initial sync.
- is related to
-
SERVER-84706 Investigate if setting the oldest timestamp greater than the stable timestamp can be avoided
- Closed
-
SERVER-35731 Prevent a repaired node from re-joining a replica set
- Closed
-
SERVER-85722 Investigate assumption that mongo layer always tells storage engine when to take a checkpoint
- Closed