-
Type: Improvement
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Replication
-
None
-
Fully Compatible
-
v4.4
-
Repl 2020-06-29, Repl 2020-07-13, Repl 2020-07-27
-
9
During this step, if we learn that another node has a newer config, we will fail the reconfig command with NewReplicaSetConfigurationIncompatible.
This extra check seems unnecessary with the safe reconfig protocol.
The error is also confusing in a concurrent stepdown/reconfig scenario:
- We have a 5 node replica set, with three voting nodes (node0, node2, and node4)
- The current config is
{version: 22, term: 10}
and the current primary is node2
- We step up node0, and it runs for an election in term 11
- Node2 receives a reconfig command for {version: 23, term: 10}
- Node2 steps down because it hears of a new term, 11, via a vote request from node2. Note, during stepdown, we do not kill the reconfig command unless we are writing down the config document (which takes a DB X lock).
- Node0 wins the election (with votes from node2 and node4) and successfully increments the term on step up. The current config is {version: 22, term: 11}
- Node2 does not install the newer config since it's already in the midst of a reconfig
- Finally, Node2 fails during its quorum check because Node0 already has a newer config.
If we remove the quorum check, we will fail later in the protocol here. This is still safe and also returns a more accurate error (NotMaster).
- related to
-
SERVER-47948 Replica set reconfig quorum check should compare configs based on version and term
- Closed