Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 2.4.10, 2.6.0
Component/s: Replication
Labels:
None

Assigned Teams:

Replication
Operating System:
ALL
Case:
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

We observed in production an example of a replica set node going into a FATAL state as a result of a failed oplog query against an inaccessible primary node during the rollback 2 FindCommonPoint phase.

There might be other instances of replication failures resulting in FATAL but this is one instance we have observed in production.

FATAL node logs:

[rsBackgroundSync] replSet rollback 2 FindCommonPoint
[rsBackgroundSync] DBClientCursor::init call() failed
[rsBackgroundSync] replSet remote oplog empty or unreadable
[rsBackgroundSync] replSet error fatal, stopping replication

Primary replica set node relinquishing its PRIMARY status:

[rsMgr] replSet relinquishing primary state
[rsMgr] replSet SECONDARY
[rsMgr] replSet closing client sockets after relinquishing primary
(fatal node tries unsuccessfully to query this node's oplog while primary is closing client connections)

Health Poll logs on non-FATAL node in same replica set:

[rsHealthPoll] replSet member (fatal node hostname:port) is now in state FATAL

If there is a way to handle this case more gracefully, perhaps it might be possible to avoid going into a FATAL state.

is related to

SERVER-15089 Thread applier (bgsync) through replication coordinator

Closed

related to

SERVER-18035 Data Replicator: Refactor Rollback Code

Closed

SERVER-5930 rollback loop should be smarter

Closed

Assignee:: [DO NOT USE] Backlog - Replication Team
Reporter:: Benety Goh
Participants:: [DO NOT USE] Backlog - Replication Team, Benety Goh
Votes:: 1 Vote for this issue
Watchers:: 8 Start watching this issue

Created:: Apr 14 2014 03:01:55 PM UTC
Updated:: Dec 06 2022 05:07:54 AM UTC
Resolved:: Jun 11 2019 06:56:00 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates