Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 3.4.14
Component/s: Replication
Labels:
None

Sprint:
Repl 2018-07-30
Case:
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

It has been observed that with chained replication disabled when the current primary becomes unresponsive and the secondaries elect a new primary, they keep syncing to the original primary for a notable amount of time instead of switching to the new one as soon as it is transitioned into PRIMARY. It causes the following issues:

The new primary will fail to acknowledge w:2+ writes since there are no secondaries syncing from it, effectively making the outage longer
If the original primary gets unblocked, there is likely to be a rollback not only on that primary but also on the secondaries.
The rollback can happen on a majority of the replica set members

I would be better if the secondaries could re-evaluate their sync source immediately after the new primary becomes available for writes.

duplicates

SERVER-35200 Speed up failure detection in the OplogFetcher during steady state replication

Closed

is related to

SERVER-35996 Create performance tests for measuring failover speed for planned stepdowns

Closed

SERVER-35200 Speed up failure detection in the OplogFetcher during steady state replication

Closed

Assignee:: Tess Avitabile (Inactive)
Reporter:: Dmitry Ryabtsev
Participants:: Dmitry Ryabtsev, Spencer Brody, Tess Avitabile
Votes:: 1 Vote for this issue
Watchers:: 12 Start watching this issue

Created:: Jun 27 2018 02:06:01 AM UTC
Updated:: Jul 26 2018 06:22:19 PM UTC
Resolved:: Jul 06 2018 06:51:32 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates