Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 2.3.2
Component/s: Replication
Labels:
- sync
Environment:
CentOS6.2 x86_64

Assigned Teams:

Replication
Operating System:
ALL
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

Problem

We met a serious problem of getting STALE DATA.

This problem comes from slaveDelay and Ghost sync.

Situation

Replica set members

"members" : [
{  "_id" : 0,
   "host" : "192.168.159.133:27017",
   "priority" : 2
},{"_id" : 1,
   "host" : "192.168.159.134:27017"
},{"_id" : 2,
   "host" : "192.168.159.135:27017",
   "priority" : 0,
   "slaveDelay" : 300
}]

Problem1 : syncFrom

rs.syncFrom('192.168.159.135:27017')
{
   "syncFromRequested" : "192.168.159.135:27017",
   "warning" : "requested member is more than 10 seconds behind us",
   "prevSyncTarget" : "192.168.159.133:27017",
   "ok" : 1
}

I can see this warnings, if we set the miss settings.
But we won't get this warnings, when this replica set was dull.

rs.syncFrom('192.168.159.135:27017')
{
   "syncFromRequested" : "192.168.159.135:27017",
   "prevSyncTarget" : "192.168.159.133:27017",
   "ok" : 1
}

This problem lead to human error.
But bearable, because we can avoid it.

Problem2 : Automatic ghost sync caused by network trouble.

in 192.168.159.133
Simulate the network trouble.

iptables -A INPUT -p tcp --dport 27017 -s 192.168.159.134 -j DROP

Then 192.168.154.134 is still available !!
192.168.154.134 would change the sync target form primary(192.168.154.133) to slaveDelay secondary(192.168.154.135) and KEEP ALIVE in spite of being delayed !!

But we (mongo client) cannot realize that 192.168.154.134 is now delayed.
We think, the node should die (unreachable from client) instead of unexpected delay.
Then we (client) can read fresh data from primary.

depends on

SERVER-7200 use oplog as op buffer on secondaries

Closed

related to

SERVER-4935 Mark node Recovering when replication lag exceeds a configured threshold

Closed

Assignee:: [DO NOT USE] Backlog - Replication Team
Reporter:: Hiroaki
Participants:: [DO NOT USE] Backlog - Replication Team, Eric Milkie, Hiroaki, Kristina Chodorow
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Feb 08 2013 03:17:26 AM UTC
Updated:: Dec 06 2022 05:25:03 AM UTC
Resolved:: Sep 08 2016 06:39:09 PM UTC

Details

Description

Problem

Situation

Replica set members

Problem1 : syncFrom

Problem2 : Automatic ghost sync caused by network trouble.

Attachments

Issue Links

Activity

People

Dates