Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-21062

A REMOVED node that is ahead of the other nodes in the set can prevent a primary from being elected

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 3.2.0-rc1
    • Affects Version/s: 3.2.0-rc0
    • Component/s: None
    • None
    • Fully Compatible
    • ALL

      Following an upgrade of mmapv1 SCCC config servers to CSRS, I sometimes (about 60% of the time) see the new replica set get stuck without a primary after the first config server is restarted without --configsvrMode=sccc set and enters the REMOVED state. The remaining 3 replica set members stay in SECONDARY state.

      This is with commit dbbc9a2e3d8c4d7fe1748fa980ba7d01b9489dbe.

      rs.status():

      csrs:REMOVED> rs.status()
      {
      	"set" : "csrs",
      	"date" : ISODate("2015-10-21T21:51:22.697Z"),
      	"myState" : 10,
      	"term" : NumberLong(1),
      	"configsvr" : true,
      	"heartbeatIntervalMillis" : NumberLong(2000),
      	"members" : [
      		{
      			"_id" : 0,
      			"name" : "neurofunk.local:9007",
      			"health" : 1,
      			"state" : 10,
      			"stateStr" : "REMOVED",
      			"uptime" : 53,
      			"optime" : {
      				"ts" : Timestamp(1445464229, 1),
      				"t" : NumberLong(1)
      			},
      			"optimeDate" : ISODate("2015-10-21T21:50:29Z"),
      			"infoMessage" : "could not find member to sync from",
      			"configVersion" : 3,
      			"self" : true
      		},
      		{
      			"_id" : 1,
      			"name" : "neurofunk.local:53836",
      			"health" : 1,
      			"state" : 2,
      			"stateStr" : "SECONDARY",
      			"uptime" : 52,
      			"optime" : {
      				"ts" : Timestamp(1445464217, 1),
      				"t" : NumberLong(1)
      			},
      			"optimeDate" : ISODate("2015-10-21T21:50:17Z"),
      			"lastHeartbeat" : ISODate("2015-10-21T21:51:22.161Z"),
      			"lastHeartbeatRecv" : ISODate("2015-10-21T21:51:22.111Z"),
      			"pingMs" : NumberLong(0),
      			"configVersion" : 3
      		},
      		{
      			"_id" : 2,
      			"name" : "neurofunk.local:53835",
      			"health" : 1,
      			"state" : 2,
      			"stateStr" : "SECONDARY",
      			"uptime" : 52,
      			"optime" : {
      				"ts" : Timestamp(1445464217, 1),
      				"t" : NumberLong(1)
      			},
      			"optimeDate" : ISODate("2015-10-21T21:50:17Z"),
      			"lastHeartbeat" : ISODate("2015-10-21T21:51:22.161Z"),
      			"lastHeartbeatRecv" : ISODate("2015-10-21T21:51:22.111Z"),
      			"pingMs" : NumberLong(0),
      			"configVersion" : 3
      		},
      		{
      			"_id" : 4,
      			"name" : "neurofunk.local:53834",
      			"health" : 1,
      			"state" : 2,
      			"stateStr" : "SECONDARY",
      			"uptime" : 52,
      			"optime" : {
      				"ts" : Timestamp(1445464217, 1),
      				"t" : NumberLong(1)
      			},
      			"optimeDate" : ISODate("2015-10-21T21:50:17Z"),
      			"lastHeartbeat" : ISODate("2015-10-21T21:51:22.161Z"),
      			"lastHeartbeatRecv" : ISODate("2015-10-21T21:51:22.111Z"),
      			"pingMs" : NumberLong(0),
      			"configVersion" : 3
      		}
      	],
      	"ok" : 1,
      	"$gleStats" : {
      		"lastOpTime" : Timestamp(0, 0),
      		"electionId" : ObjectId("000000000000000000000000")
      	}
      }
      csrs:REMOVED> 
      

      I will attach logs.

        1. SERVER-21062.tar.gz
          55 kB
        2. original-configsvr1-pre-second-restart.log
          39 kB
        3. original-configsvr1-post-second-restart.log
          276 kB
        4. new-configsvr3.log
          49 kB
        5. new-configsvr2.log
          56 kB
        6. new-configsvr1.log
          51 kB

            Assignee:
            scotthernandez Scott Hernandez (Inactive)
            Reporter:
            tim.olsen@mongodb.com Timothy Olsen (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated:
              Resolved: