Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-38366

Replica set nodes update the term without verifying the config version can lead to unnecessary stepdown.

    • Type: Icon: Bug Bug
    • Resolution: Won't Fix
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 4.1.5
    • Component/s: Replication
    • None
    • Replication
    • ALL

      Currently, the replica set nodes can learn about the higher term via heartbeart, oplog fetcher and cmds (like find & getmore).  When the term is learnt via oplog fetcher,  it calls ReplicationCoordinatorImpl::_processReplSetMetadata_inlock which updates the term only if the config version of the sync source is same as mine. We are missing that config version check in heartbeat, find and getmore before updating the term.

      Also to be noted is that in ReplicationCoordinatorImpl::_handleHeartbeatResponse we update the term in 2 places     

       

      Note : This bug was captured for this particular upgrade/downgrade sequence (pv1->pv0->pv1) where it lead to unnecessary stepdown.

      1) Start a replica set in pv1.

      2) Insert some document in pv1 (for term =1)

      3)Downgrade to pv0 while the secondaries are still replicating the documents from previous pv1 (term =1)

      4) Upgrade to pv1 before the secondaries downgrade to pv0.

      5) The current primary which is in term 0 receives heartbeat from the secondaries which think they are still in term 1(from step 1)

      6) As a result, the current primary updates its term to 1 and steps down and starts a new election for term 2.

            Assignee:
            backlog-server-repl [DO NOT USE] Backlog - Replication Team
            Reporter:
            suganthi.mani@mongodb.com Suganthi Mani
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: