Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: 2.0.5, 2.1.1
Affects Version/s: 2.0.2
Component/s: Replication
Labels:
None
Environment:
Ubuntu 10.04 LTS x64
Mac OS X 10.7

Operating System:
ALL
Confidence Status:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

If a replica set member with higher priority comes online, the current primary relinquishes primary state regardless of the state of the other member and before that member has any status. This means a functioning cluster will lose state and require an election just because another member came online with a higher priority. This does not happen if there is no priority set for any member.

1. Set up a replica set with 3 nodes, 2 of them with high priorities:

{
 "_id" : "test1",
 "members" : [
 {
 "_id" : 0,
 "host" : "localhost:27017",
 "priority" : 3
 },
 {
 "_id" : 1,
 "host" : "localhost:27018",
 "priority" : 2
 },
 {
 "_id" : 2,
 "host" : "localhost:27019"
 },
 ]
}

2. Wait for the set to come online with "localhost:27017" as the primary
3. On "localhost:27017" issue rs.stepDown() so that "localhost:27018" becomes primary
4. Kill mongod on "localhost:27017"
5. Delete the data directory on "localhost:27017" so that when it comes up it is not immediately ready to take over as primary and requires a resync.
5. Restart mongod on "localhost:27017"

What happens: "localhost:27018" immediately loses its primary state then gets reelected as primary
What should happen: "localhost:27018" should remain primary until it is safe to re-elect "localhost:27017" as the higher priority node

Log from "localhost:27018"

Mon Jan 30 15:50:17 [rsHealthPoll] DBClientCursor::init call() failed
Mon Jan 30 15:50:17 [rsHealthPoll] replSet info localhost:27017 is down (or slow to respond): DBClientBase::findN: transport error: localhost:27017 query: { replSetHeartbeat: "test1", v: 1, pv: 1, checkEmpty: false, from: "localhost:27018" }
Mon Jan 30 15:50:17 [rsHealthPoll] replSet member localhost:27017 is now in state DOWN
Mon Jan 30 15:50:33 [rsHealthPoll] replSet member localhost:27017 is up
Mon Jan 30 15:50:33 [rsMgr] stepping down localhost:27018
Mon Jan 30 15:50:33 [rsMgr] replSet relinquishing primary state
Mon Jan 30 15:50:33 [rsMgr] replSet SECONDARY
Mon Jan 30 15:50:33 [rsMgr] replSet closing client sockets after reqlinquishing primary
Mon Jan 30 15:50:33 [conn1] end connection 127.0.0.1:57612
Mon Jan 30 15:50:33 [rsHealthPoll] replSet info localhost:27019 is down (or slow to respond): socket exception
Mon Jan 30 15:50:33 [rsHealthPoll] replSet member localhost:27019 is now in state DOWN
Mon Jan 30 15:50:33 [rsMgr] replSet not electing self, not all members up and we have been up less than 5 minutes
Mon Jan 30 15:50:35 [conn12] SocketException handling request, closing client connection: 9001 socket exception [2] server [127.0.0.1:57654] 
Mon Jan 30 15:50:35 [rsHealthPoll] replSet member localhost:27019 is up
Mon Jan 30 15:50:35 [rsHealthPoll] replSet member localhost:27019 is now in state SECONDARY
Mon Jan 30 15:50:35 [rsMgr] not electing self, localhost:27019 would veto
Mon Jan 30 15:50:37 [rsMgr] not electing self, localhost:27019 would veto
Mon Jan 30 15:50:40 [conn10] end connection 127.0.0.1:57650
Mon Jan 30 15:50:40 [initandlisten] connection accepted from 127.0.0.1:57680 #13
Mon Jan 30 15:50:41 [rsMgr] not electing self, localhost:27019 would veto
Mon Jan 30 15:50:43 [initandlisten] connection accepted from 127.0.0.1:57683 #14
Mon Jan 30 15:50:46 [rsHealthPoll] replSet member localhost:27017 is now in state STARTUP2
Mon Jan 30 15:50:46 [rsMgr] not electing self, localhost:27017 would veto
Mon Jan 30 15:50:46 [rsMgr] not electing self, localhost:27017 would veto
Mon Jan 30 15:50:52 [rsMgr] replSet info electSelf 1
Mon Jan 30 15:50:52 [rsMgr] replSet PRIMARY
Mon Jan 30 15:50:54 [rsHealthPoll] replSet member localhost:27017 is now in state RECOVERING
Mon Jan 30 15:50:54 [initandlisten] connection accepted from 127.0.0.1:57687 #15
Mon Jan 30 15:50:59 [conn14] end connection 127.0.0.1:57683

Assignee:: Kristina Chodorow (Inactive)

Reporter:: David Mytton

Participants:: auto, David Mytton, Kristina Chodorow

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: Jan 30 2012 03:58:54 PM UTC

Updated:: Jul 11 2016 06:34:07 PM UTC

Resolved:: Mar 29 2012 02:40:47 PM UTC

GA Target Date:: None

Public Preview Target Date:: None

Private Preview Target Date:: None

Experiment Target Date:: None

Details

Description

Attachments

Activity

People

Dates