Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-36746

A failed step down attempt shouldn't unconditionally reset LeaderMode to kMaster

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 3.6.9, 4.0.3, 4.1.3
    • Affects Version/s: None
    • Component/s: Replication
    • None
    • Fully Compatible
    • ALL
    • v4.0, v3.6
    • Repl 2018-09-10
    • 0

      When a node initiates a step down attempt, it will execute the ReplicationCoordinatorImpl::stepDown method. This method executes a number of pre-condition checks and waits to acquire the global lock. Then, it will prepare for the step down attempt and set the current leader mode to kAttemptingStepDown. Next, it sets up logic to run if the stepdown attempt fails. If the attempt fails, we will trigger this block and set our current leader mode to kMaster. After doing this, we will call updateMemberStateFromTopologyCoordinator_inlock, which, if our leader mode is currently kMaster will set the canAcceptNonLocalWrites flag to true. This is incorrect, though, since when we started our step down attempt, there is no guarantee that we were actually in leader mode kMaster. We check that our current member state is PRIMARY, but we may have still been in drain mode, and therefore unable to accept writes, meaning our leader mode would have still been in kLeaderElect. If we fail the step down attempt and don't actually step down, we shouldn't automatically set the leader mode to kMaster. We should leave ourselves in kLeaderElect if that's the state we were in when we started the stepdown attempt.

        1. draining_primary_accepts_writes.js
          4 kB
          William Schultz
        2. draining_primary_accepts_writes_fassert.js
          4 kB
          William Schultz

            Assignee:
            spencer@mongodb.com Spencer Brody (Inactive)
            Reporter:
            william.schultz@mongodb.com William Schultz (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: