-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: 2.0.2
-
Component/s: Replication
-
None
-
Environment:Ubuntu 10.04 LTS x64
Mac OS X 10.7
-
ALL
If a replica set member with higher priority comes online, the current primary relinquishes primary state regardless of the state of the other member and before that member has any status. This means a functioning cluster will lose state and require an election just because another member came online with a higher priority. This does not happen if there is no priority set for any member.
1. Set up a replica set with 3 nodes, 2 of them with high priorities:
{ "_id" : "test1", "members" : [ { "_id" : 0, "host" : "localhost:27017", "priority" : 3 }, { "_id" : 1, "host" : "localhost:27018", "priority" : 2 }, { "_id" : 2, "host" : "localhost:27019" }, ] }
2. Wait for the set to come online with "localhost:27017" as the primary
3. On "localhost:27017" issue rs.stepDown() so that "localhost:27018" becomes primary
4. Kill mongod on "localhost:27017"
5. Delete the data directory on "localhost:27017" so that when it comes up it is not immediately ready to take over as primary and requires a resync.
5. Restart mongod on "localhost:27017"
What happens: "localhost:27018" immediately loses its primary state then gets reelected as primary
What should happen: "localhost:27018" should remain primary until it is safe to re-elect "localhost:27017" as the higher priority node
Log from "localhost:27018"
Mon Jan 30 15:50:17 [rsHealthPoll] DBClientCursor::init call() failed Mon Jan 30 15:50:17 [rsHealthPoll] replSet info localhost:27017 is down (or slow to respond): DBClientBase::findN: transport error: localhost:27017 query: { replSetHeartbeat: "test1", v: 1, pv: 1, checkEmpty: false, from: "localhost:27018" } Mon Jan 30 15:50:17 [rsHealthPoll] replSet member localhost:27017 is now in state DOWN Mon Jan 30 15:50:33 [rsHealthPoll] replSet member localhost:27017 is up Mon Jan 30 15:50:33 [rsMgr] stepping down localhost:27018 Mon Jan 30 15:50:33 [rsMgr] replSet relinquishing primary state Mon Jan 30 15:50:33 [rsMgr] replSet SECONDARY Mon Jan 30 15:50:33 [rsMgr] replSet closing client sockets after reqlinquishing primary Mon Jan 30 15:50:33 [conn1] end connection 127.0.0.1:57612 Mon Jan 30 15:50:33 [rsHealthPoll] replSet info localhost:27019 is down (or slow to respond): socket exception Mon Jan 30 15:50:33 [rsHealthPoll] replSet member localhost:27019 is now in state DOWN Mon Jan 30 15:50:33 [rsMgr] replSet not electing self, not all members up and we have been up less than 5 minutes Mon Jan 30 15:50:35 [conn12] SocketException handling request, closing client connection: 9001 socket exception [2] server [127.0.0.1:57654] Mon Jan 30 15:50:35 [rsHealthPoll] replSet member localhost:27019 is up Mon Jan 30 15:50:35 [rsHealthPoll] replSet member localhost:27019 is now in state SECONDARY Mon Jan 30 15:50:35 [rsMgr] not electing self, localhost:27019 would veto Mon Jan 30 15:50:37 [rsMgr] not electing self, localhost:27019 would veto Mon Jan 30 15:50:40 [conn10] end connection 127.0.0.1:57650 Mon Jan 30 15:50:40 [initandlisten] connection accepted from 127.0.0.1:57680 #13 Mon Jan 30 15:50:41 [rsMgr] not electing self, localhost:27019 would veto Mon Jan 30 15:50:43 [initandlisten] connection accepted from 127.0.0.1:57683 #14 Mon Jan 30 15:50:46 [rsHealthPoll] replSet member localhost:27017 is now in state STARTUP2 Mon Jan 30 15:50:46 [rsMgr] not electing self, localhost:27017 would veto Mon Jan 30 15:50:46 [rsMgr] not electing self, localhost:27017 would veto Mon Jan 30 15:50:52 [rsMgr] replSet info electSelf 1 Mon Jan 30 15:50:52 [rsMgr] replSet PRIMARY Mon Jan 30 15:50:54 [rsHealthPoll] replSet member localhost:27017 is now in state RECOVERING Mon Jan 30 15:50:54 [initandlisten] connection accepted from 127.0.0.1:57687 #15 Mon Jan 30 15:50:59 [conn14] end connection 127.0.0.1:57683