-
Type: Improvement
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: 3.4.14
-
Component/s: Replication
-
None
-
Repl 2018-07-30
-
(copied to CRM)
It has been observed that with chained replication disabled when the current primary becomes unresponsive and the secondaries elect a new primary, they keep syncing to the original primary for a notable amount of time instead of switching to the new one as soon as it is transitioned into PRIMARY. It causes the following issues:
- The new primary will fail to acknowledge w:2+ writes since there are no secondaries syncing from it, effectively making the outage longer
- If the original primary gets unblocked, there is likely to be a rollback not only on that primary but also on the secondaries.
- The rollback can happen on a majority of the replica set members
I would be better if the secondaries could re-evaluate their sync source immediately after the new primary becomes available for writes.
- duplicates
-
SERVER-35200 Speed up failure detection in the OplogFetcher during steady state replication
- Closed
- is related to
-
SERVER-35996 Create performance tests for measuring failover speed for planned stepdowns
- Closed
-
SERVER-35200 Speed up failure detection in the OplogFetcher during steady state replication
- Closed