Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-38505

For pv1, to determine if the oplog entries are applied out of order, we should compare both the term and timestamp of firstOpTimeInBatch and lastAppliedOpTimeAtStartOfBatch

    • Type: Icon: Bug Bug
    • Resolution: Won't Fix
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.6.9
    • Component/s: None
    • None
    • Replication
    • ALL

      Consider the below upgrade->downgrade->upgrade (pv1->pv0->pv1 ) sequence.

      1) Start a replica set in pv1.

      2) Insert some document in pv1 (for term =1)

      3) Downgrade to pv0 while the secondaries are still replicating the documents from previous pv1 (term =1)

      4) Upgrade to pv1 before the secondaries downgrade to pv0.

      5) Secondaries learns the new term (term 0) from the heartbeat received from primary while their lastAppliedOpTimes are still in term 1.

      6) Lets say, on secondaries, the node's lastAppliedOpTime & lastFetchedOpTime is (100, t:1). And, when they try to replicate the oplog entries from primary, it adds a filter in the find command to fetch only the oplog entries  having timestamp greater than or equal to  our lastFetchedOpTime's timestamp "100". When secondaries receive a batch combining oplog entries from step 2(pv1), step3 (pv0) and step4(pv1) (say (100, t:1)| (101, t:1) || (102, t:-1) || (103, t:0)), we apply those entries and try to move forward our lastAppliedOpTime to the last entry in the batch (103,t:0). But, unfortunately, we can't move forward our lastAppliedOpTime as (103,t:0) < (100, t:1).

      7) Assume, that secondary receives next batch starting with (104, t:0). Before applying the batch, we verify that the oplog entries are not applied out of order by checking that first entry's optime in the batch  is lesser or equal to the lastAppliedOptime.  Since (104, t:0) is less than our lastAppliedOpTime (100, t:1), it leads to fassert failure.

       Here we see 2 problems

          1) Step6 where the lastAppliedOpTime is not moving forward because a batch has oplog entries (from previous pv1, pv0, pv1).

          2) Step7 where we get fassert failure stating that oplog entries are applied out of order.

      Problem 2 won't occur as we would invariant during step 6 while trying to move forward our lastAppliedOptime (see SERVER-35608).

            Assignee:
            backlog-server-repl [DO NOT USE] Backlog - Replication Team
            Reporter:
            suganthi.mani@mongodb.com Suganthi Mani
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: