-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Replication
-
None
-
Fully Compatible
-
ALL
-
Repl 2017-04-17
-
0
During ReplSetTest initialization, when elections happen while adding nodes to the replica set, it is possible for node 0 (the intended primary node after initialization) to lag the other nodes before it is asked to step up.
this.stepUp(this.nodes[0]); } // Setup authentication if running test with authentication if ((jsTestOptions().keyFile) && cmdKey == 'replSetInitiate') { master = this.getPrimary(); jsTest.authenticateNodes(this.nodes); } }; this.stepUp = function(node) { this.awaitSecondaryNodes(); this.awaitNodesAgreeOnPrimary(); if (this.getPrimary() === node) { return; }
This may result in the step up phase of ReplSetTest,initiate() failing to elect node 0 with a "candidate's data is staler than mine" reason.
[js_test:replsetprio1] 2017-03-15T13:03:02.788+0000 d20010| 2017-03-15T13:03:02.787+0000 I REPL [conn2] stepping down from primary, because a new term has begun: 2 [js_test:replsetprio1] 2017-03-15T13:03:02.788+0000 d20010| 2017-03-15T13:03:02.787+0000 I REPL [replExecDBWorker-2] transition to SECONDARY ... [js_test:replsetprio1] 2017-03-15T13:03:02.802+0000 d20010| 2017-03-15T13:03:02.799+0000 D REPL_HB [conn2] Received heartbeat request from mci-osx1010-10.build.10gen.cc:20011, { replSetHeartbeat: "testSet", configVersion: 2, from: "mci-osx1010-10.build.10gen.cc:20011", fromId: 1, term: 2 } [js_test:replsetprio1] 2017-03-15T13:03:02.802+0000 d20010| 2017-03-15T13:03:02.799+0000 D REPL_HB [conn2] Processed heartbeat from mci-osx1010-10.build.10gen.cc:20011 and generated response, { ok: 1.0, state: 2, v: 2, hbmsg: "", set: "testSet", term: 1, durableOpTime: { ts: Timestamp 1489582961000|2, t: 1 }, opTime: { ts: Timestamp 1489582961000|2, t: 1 } } ... [js_test:replsetprio1] 2017-03-15T13:03:03.315+0000 d20010| 2017-03-15T13:03:03.315+0000 D REPL_HB [ReplicationExecutor] Sending heartbeat (requestId: 52) to mci-osx1010-10.build.10gen.cc:20012, { replSetHeartbeat: "testSet", configVersion: 2, from: "mci-osx1010-10.build.10gen.cc:20010", fromId: 0, term: 1 } [js_test:replsetprio1] 2017-03-15T13:03:03.316+0000 d20010| 2017-03-15T13:03:03.315+0000 D REPL_HB [ReplicationExecutor] Sending heartbeat (requestId: 53) to mci-osx1010-10.build.10gen.cc:20011, { replSetHeartbeat: "testSet", configVersion: 2, from: "mci-osx1010-10.build.10gen.cc:20010", fromId: 0, term: 1 } ... [js_test:replsetprio1] 2017-03-15T13:03:03.318+0000 d20010| 2017-03-15T13:03:03.316+0000 D REPL_HB [ReplicationExecutor] Received response to heartbeat (requestId: 52) from mci-osx1010-10.build.10gen.cc:20012, { ok: 1.0, state: 2, v: 2, hbmsg: "", set: "testSet", term: 2, primaryId: 0, durableOpTime: { ts: Timestamp 1489582961000|2, t: 1 }, opTime: { ts: Timestamp 1489582961000|2, t: 1 }, operationTime: Timestamp 1489582982000|1 } [js_test:replsetprio1] 2017-03-15T13:03:03.318+0000 d20010| 2017-03-15T13:03:03.316+0000 I REPL [ReplicationExecutor] Member mci-osx1010-10.build.10gen.cc:20012 is now in state SECONDARY [js_test:replsetprio1] 2017-03-15T13:03:03.318+0000 d20010| 2017-03-15T13:03:03.316+0000 D REPL_HB [ReplicationExecutor] Scheduling heartbeat to mci-osx1010-10.build.10gen.cc:20012 at 2017-03-15T13:03:08.316Z [js_test:replsetprio1] 2017-03-15T13:03:03.319+0000 d20010| 2017-03-15T13:03:03.316+0000 D REPL_HB [ReplicationExecutor] Received response to heartbeat (requestId: 53) from mci-osx1010-10.build.10gen.cc:20011, { ok: 1.0, electionTime: new Date(6397710192368156673), state: 1, v: 2, hbmsg: "", set: "testSet", term: 2, primaryId: 1, durableOpTime: { ts: Timestamp 1489582961000|2, t: 1 }, opTime: { ts: Timestamp 1489582961000|2, t: 1 }, operationTime: Timestamp 1489582982000|1 } [js_test:replsetprio1] 2017-03-15T13:03:03.319+0000 d20010| 2017-03-15T13:03:03.316+0000 I REPL [ReplicationExecutor] Member mci-osx1010-10.build.10gen.cc:20011 is now in state PRIMARY [js_test:replsetprio1] 2017-03-15T13:03:03.319+0000 d20010| 2017-03-15T13:03:03.316+0000 D REPL_HB [ReplicationExecutor] Scheduling heartbeat to mci-osx1010-10.build.10gen.cc:20011 at 2017-03-15T13:03:08.316Z ... [js_test:replsetprio1] 2017-03-15T13:03:07.781+0000 d20010| 2017-03-15T13:03:07.781+0000 I COMMAND [conn11] Received replSetStepUp request [js_test:replsetprio1] 2017-03-15T13:03:07.781+0000 d20010| 2017-03-15T13:03:07.781+0000 I REPL [conn11] Starting an election due to step up request [js_test:replsetprio1] 2017-03-15T13:03:07.782+0000 d20010| 2017-03-15T13:03:07.781+0000 I REPL [conn11] conducting a dry run election to see if we could be elected [js_test:replsetprio1] 2017-03-15T13:03:07.782+0000 d20010| 2017-03-15T13:03:07.781+0000 I REPL [ReplicationExecutor] VoteRequester(term 2 dry run) received a no vote from mci-osx1010-10.build.10gen.cc:20011 with reason "candidate's data is staler than mine"; response message: { term: 2, voteGranted: false, reason: "candidate's data is staler than mine", ok: 1.0, operationTime: Timestamp 1489582984000|1 } [js_test:replsetprio1] 2017-03-15T13:03:07.783+0000 d20010| 2017-03-15T13:03:07.782+0000 I REPL [ReplicationExecutor] VoteRequester(term 2 dry run) received a no vote from mci-osx1010-10.build.10gen.cc:20012 with reason "candidate's data is staler than mine"; response message: { term: 2, voteGranted: false, reason: "candidate's data is staler than mine", ok: 1.0, operationTime: Timestamp 1489582984000|1 } [js_test:replsetprio1] 2017-03-15T13:03:07.783+0000 d20010| 2017-03-15T13:03:07.782+0000 I REPL [ReplicationExecutor] not running for primary, we received insufficient votes [js_test:replsetprio1] 2017-03-15T13:03:07.783+0000 d20010| 2017-03-15T13:03:07.782+0000 I COMMAND [conn11] replSetStepUp request failed :: caused by :: CommandFailed: Election failed.
- is depended on by
-
SERVER-28699 ReplSetTest should always awaitSecondaryNodes() at the end of initiate
- Closed
- is duplicated by
-
SERVER-28273 replsettest.js needs to authenticate before stepping up node 0
- Closed
- is related to
-
SERVER-27785 Make external authorization tests not auto-detect wcMajorityJournalDefault
- Closed
-
SERVER-27107 Update 'last-stable' MongoRunner alias to "3.4" in master branch
- Closed
-
SERVER-14017 Refactor ShardingTest and ReplSetTest objects to use separate control and test connections
- Backlog
-
SERVER-21714 Increase replSetTest.initiate() timeout for FSM tests
- Closed
-
SERVER-27490 Stop consulting storage engine isDurable flag in ReplicationCoordinatorImpl and simplify ReplCoordTestFixture
- Closed
-
SERVER-27810 Guarantee that replicaset is stable with node 0 as primary after ReplSetTest.initiate()
- Closed