-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Replication
-
None
-
Fully Compatible
-
ALL
-
v4.0, v3.6
-
Repl 2018-07-16, Repl 2018-07-30, Repl 2018-08-13
-
17
When a candidate tries to run for election in a new term, it may try to seek a vote from a current primary in an older term. If it wins the dry run election, it increments its local term and then sends out a vote request in the new term. It is possible that this vote request command ends up running concurrently with another command sent from this candidate; for example, an updatePosition request. We currently don't appear to pause updatePosition requests when we become a candidate, so a poorly timed updatePosition request sent after we become a candidate may get sent to the old primary and cause it to step down, since it is sent with a higher term. If this command triggers a step down while running concurrently with a vote request, it may cause the vote request to fail, causing the candidate to potentially lose that election since if it required the vote of that node. This would mean it may need to wait one more election timeout before it tries to get elected again.
This issue appeared because it seems that we check for generic interruptions in the common command processing codepath. Perhaps we want to make these interrupt checks immune to certain interruption types e.g. InterruptedDueToReplStepDown.
- is related to
-
SERVER-34682 Old primary should vote yes and store the last vote after stepdown on learning of a higher term
- Closed