-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: 4.2.3
-
Component/s: Replication
-
Fully Compatible
-
ALL
-
v4.4, v4.2, v4.0
-
Repl 2020-06-01, Repl 2020-06-15, Repl 2020-06-29
When a node loses a dry run election, it logs the message
Lost dry run election due to internal error
This is misleading, at least in the cases I saw - it loses the dry run election for a pretty standard reason (new term started, not enough votes). This is not really an "internal error", just the mechanics of elections.
This also applies with LOGV2, where we have "Lost election due to internal error" with an "isDryRun" attribute, but it's still misleading. This is usually not (is never?) an internal error:
https://github.com/mongodb/mongo/blob/edb8778350326d2b33f056b1b5f0b25a4b5b444a/src/mongo/db/repl/replication_coordinator_impl_elect_v1.cpp#L58
Here are two examples of this error message
2020-03-26T17:53:53.115+0000 I ELECTION [replexec-37] Starting an election, since we've seen no PRIMARY in the past 5000ms 2020-03-26T17:53:53.115+0000 I ELECTION [replexec-37] conducting a dry run election to see if we could be elected. current term: 132 2020-03-26T17:53:53.116+0000 I REPL [replexec-37] Scheduling remote command request for vote request: RemoteCommand 1091683059 -- target:myreplset-00-abcde.mongodb.net:27017 db:admin cmd:{ replSetRequestVotes: 1, setName: "MyReplset", dryRun: true, term: 132, candidateIndex: 1, configVersion: 7, lastCommittedOp: { ts: Timestamp(1585245224, 200), t: 132 } } 2020-03-26T17:53:53.116+0000 I REPL [replexec-37] Scheduling remote command request for vote request: RemoteCommand 1091683060 -- target:myreplset-02-abcde.mongodb.net:27017 db:admin cmd:{ replSetRequestVotes: 1, setName: "MyReplset", dryRun: true, term: 132, candidateIndex: 1, configVersion: 7, lastCommittedOp: { ts: Timestamp(1585245224, 200), t: 132 } } ... 2020-03-26T17:53:53.121+0000 I ELECTION [replexec-38] VoteRequester(term 132 dry run) received a no vote from myreplset-00-abcde.mongodb.net:27017 with reason "candidate's data is staler than mine. candidate's last applied OpTime: { ts: Timestamp(1585245224, 200), t: 132 }, my last applied OpTime: { ts: Timestamp(1585245233, 27), t: 132 }"; response message: { term: 132, voteGranted: false, reason: "candidate's data is staler than mine. candidate's last applied OpTime: { ts: Timestamp(1585245224, 200), t: 132 }, my last applied OpTime: { ts: Times...", ok: 1.0, $clusterTime: { clusterTime: Timestamp(1585245233, 27), signature: { hash: BinData(0, 0000000000000000000000000000000000000000), keyId: 0 } }, operationTime: Timestamp(1585245233, 27) } 2020-03-26T17:53:53.121+0000 I ELECTION [replexec-39] VoteRequester(term 132 dry run) received a no vote from myreplset-02-abcde.mongodb.net:27017 with reason "candidate's data is staler than mine. candidate's last applied OpTime: { ts: Timestamp(1585245224, 200), t: 132 }, my last applied OpTime: { ts: Timestamp(1585245233, 27), t: 132 }"; response message: { term: 132, voteGranted: false, reason: "candidate's data is staler than mine. candidate's last applied OpTime: { ts: Timestamp(1585245224, 200), t: 132 }, my last applied OpTime: { ts: Times...", ok: 1.0, $clusterTime: { clusterTime: Timestamp(1585245233, 27), signature: { hash: BinData(0, 0000000000000000000000000000000000000000), keyId: 0 } }, operationTime: Timestamp(1585245233, 27) } 2020-03-26T17:53:53.121+0000 I ELECTION [replexec-39] not running for primary, we received insufficient votes 2020-03-26T17:53:53.121+0000 I ELECTION [replexec-39] Lost dry run election due to internal error
2020-03-27T02:37:15.225+0000 I ELECTION [replexec-52] VoteRequester(term 132 dry run) failed to receive response from myreplset-02-abcde.mongodb.net:27017: NetworkInterfaceExceededTimeLimit: Request 1115132387 timed out, deadline was 2020-03-27T02:37:15.225+0000, op was RemoteCommand 1115132387 -- target:[myreplset-02-abcde.mongodb.net:27017] db:admin expDate:2020-03-27T02:37:15.225+0000 cmd:{ replSetRequestVotes: 1, setName: "MyReplset", dryRun: true, term: 132, candidateIndex: 1, configVersion: 7, lastCommittedOp: { ts: Timestamp(1585276624, 7), t: 132 } } 2020-03-27T02:37:15.225+0000 I ELECTION [replexec-52] not running for primary, we have been superseded already during dry run. original term: 132, current term: 133 2020-03-27T02:37:15.225+0000 I ELECTION [replexec-52] Lost dry run election due to internal error