-
Type: Bug
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: 2.3.2
-
Component/s: Replication
-
Environment:CentOS6.2 x86_64
-
Replication
-
ALL
Problem
We met a serious problem of getting STALE DATA.
This problem comes from slaveDelay and Ghost sync.
Situation
Replica set members
"members" : [ { "_id" : 0, "host" : "192.168.159.133:27017", "priority" : 2 },{"_id" : 1, "host" : "192.168.159.134:27017" },{"_id" : 2, "host" : "192.168.159.135:27017", "priority" : 0, "slaveDelay" : 300 }]
Problem1 : syncFrom
rs.syncFrom('192.168.159.135:27017') { "syncFromRequested" : "192.168.159.135:27017", "warning" : "requested member is more than 10 seconds behind us", "prevSyncTarget" : "192.168.159.133:27017", "ok" : 1 }
I can see this warnings, if we set the miss settings.
But we won't get this warnings, when this replica set was dull.
rs.syncFrom('192.168.159.135:27017') { "syncFromRequested" : "192.168.159.135:27017", "prevSyncTarget" : "192.168.159.133:27017", "ok" : 1 }
This problem lead to human error.
But bearable, because we can avoid it.
Problem2 : Automatic ghost sync caused by network trouble.
in 192.168.159.133
Simulate the network trouble.
iptables -A INPUT -p tcp --dport 27017 -s 192.168.159.134 -j DROP
Then 192.168.154.134 is still available !!
192.168.154.134 would change the sync target form primary(192.168.154.133) to slaveDelay secondary(192.168.154.135) and KEEP ALIVE in spite of being delayed !!
But we (mongo client) cannot realize that 192.168.154.134 is now delayed.
We think, the node should die (unreachable from client) instead of unexpected delay.
Then we (client) can read fresh data from primary.
- depends on
-
SERVER-7200 use oplog as op buffer on secondaries
- Closed
- related to
-
SERVER-4935 Mark node Recovering when replication lag exceeds a configured threshold
- Closed