-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: 4.0.9
-
Component/s: Replication
-
None
-
Fully Compatible
-
ALL
-
-
Repl 2019-06-17, Repl 2019-07-29, Repl 2019-08-12, Repl 2019-08-26, Repl 2019-09-09
-
(copied to CRM)
In a replica set with all nodes on v4.0 binary version and in FCV=3.6, a clean shutdown will cause a node to set its recovery timestamp to 0. If this happens for a node whose oplog has diverged (i.e. needs to enter rollback), this node won't be able to complete the rollback since it does not have a stable timestamp to roll back to which is needed for recover-to-timestamp. Furthermore, in order to take a new stable checkpoint, it would have to commit a new majority write, which it shouldn't be able to do until it completes the rollback. It also shouldn't be able to upgrade to FCV=4.0 until the node can completes the rollback and replicate new log entries from the primary. The recommended path forward is to downgrade the binary to 3.6 and restart to complete the rollback. We will update the error message with this recommendation.
Original description:
In a replica set with all nodes on v4.0 binary version and in FCV=3.6, a clean shutdown will cause a node to set its recovery timestamp to 0. If this happens for a node whose oplog has diverged (i.e. needs to enter rollback), this node won't be able to complete the rollback since it does not have a stable timestamp to roll back to which is needed for recover-to-timestamp. Furthermore, in order to take a new stable checkpoint, it would have to commit a new majority write, which it shouldn't be able to do until it completes the rollback. It also shouldn't be able to upgrade to FCV=4.0 until the node can completes the rollback and replicate new log entries from the primary. If FCV=3.6 and we encounter this situation, falling back on the rollbackViaFetch algorithm may be the appropriate solution. Another alternative may be to always use rollbackViaRefetch whenever FCV=3.6.
- related to
-
SERVER-41931 Remove forceRollbackViaRefetch flag
- Backlog
-
SERVER-32590 RTT 3.6<->4.0 upgrade/downgrade.
- Closed