-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: 2.0.1
-
Component/s: Replication
-
Environment:Ubuntu 11.04, AWS m2.2xlarge
-
Linux
We have seen an issue a few times on the same replica where the secondary is doing about 30-40k updates/sec and it has the state secondary still but can never catch up to master it just seems to stay the same amount behind master (normally just a few minutes).
The primary has about 10 or less updates/sec so it doesn't seem to be an issue immediately of it doing too much for secondary to keep up.
It has been fixed twice by stoping and starting the secondary and it will actually catch up after that normally.
Also, when it happens the db.printReplicationInfo() will show a very small window of oplog time (< 30 min). The weird thing about it is that as time goes on if you keep checking that status the window is constantly growing but the first event time stays the same until the oplog grows large enough. So it is almost as if the oplog gets reset.
PRIMARY> db.printReplicationInfo()
configured oplog size: 4096MB
log length start to end: 92276secs (25.63hrs)
oplog first event time: Wed Nov 09 2011 19:13:39 GMT+0000 (UTC)
oplog last event time: Thu Nov 10 2011 20:51:35 GMT+0000 (UTC)
now: Thu Nov 10 2011 20:51:35 GMT+0000 (UTC)
PRIMARY> db.printReplicationInfo()
configured oplog size: 4096MB
log length start to end: 92488secs (25.69hrs)
oplog first event time: Wed Nov 09 2011 19:13:39 GMT+0000 (UTC)
oplog last event time: Thu Nov 10 2011 20:55:07 GMT+0000 (UTC)
now: Thu Nov 10 2011 20:55:07 GMT+0000 (UTC)