Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-47142

Check primary before writing replset config and no-op

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 4.4.0-rc3, 4.7.0
    • Affects Version/s: None
    • Component/s: Replication
    • None
    • Fully Compatible
    • ALL
    • v4.4
    • Repl 2020-04-06, Repl 2020-04-20
    • 42

      There are currently two problems.

      1) We do not check if we are still primary before writing down a new config document locally. Consider the following scenario:

      • Node1 receives a reconfig command
      • Node1 begins stepping down because it hears of a new term
      • Node1 starts killing both writes (and some system ops) that hold the global lock in X, IX, or S mode and reads that encounter prepare conflicts. The replSetReconfig command does not fall into either category.
      • Node1 finishes killing ops and steps down, transitioning to secondary
      • Node1 writes down the new config document, which takes the DB lock in X mode but will not be killed since we already finished stepping down

      Node1's config will continue to get propagated via heartbeats even though it already stepped down.

      2) The replSetReconfig command does a no-op write, but does not check that the node is still primary before doing so (Similar example, readConcern: linearizable)

      We end up calling onInternalOpMessage, which will pass in an empty namespace. Because of this, we don't actually do the primary check in _logOpsInner. This would mean that we can allow the reconfig no-op write to occur on a secondary.

      Since these two things should happen together to avoid any inconsistent states, we should consider refactoring the code so we can do the primary check once.

            Assignee:
            siyuan.zhou@mongodb.com Siyuan Zhou
            Reporter:
            pavithra.vetriselvan@mongodb.com Pavithra Vetriselvan
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: