Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-54441

Long Oplog Recovery times after SigAbort failures

    • Type: Icon: Question Question
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 4.9.0-alpha4
    • Component/s: Storage
    • None
    • Replication

      Executing the eMRCf_runner.sh tests with more than 7 growth iterations and enableMajorityReadConcern set to true results in a SIGAbort when shutting down the primary.

      The test involves deliberately shutting down the only secondary in a PSA replica set with EnableMajorityReadConcern true and performing a large update heavy workload (10 growth phases involves roughly 6,000,000 updates).

       

      In this scenario the Oplog Recovery phase takes a significant amount of time (~108 minutes):

      
      {"t":{"$date":"2021-01-11T02:42:11.819+00:00"},"s":"I",  "c":"REPL",     "id":21545,   "ctx":"initandlisten","msg":"Starting recovery oplog application at the stable timestamp","attr":{"stableTimestamp":{"$timestamp":{"t":1610324416,"i":1}}}}
      
      ...
      
      {"t":{"$date":"2021-01-11T04:30:53.247+00:00"},"s":"I",  "c":"REPL",     "id":21536,   "ctx":"initandlisten","msg":"Completed oplog application for recovery","attr":{"numOpsApplied":114391580,"numBatches":22879,"applyThroughOpTime":{"ts":{"$timestamp":{"t":1610331306,"i":2}},"t":2}}} 

       

       Given that this is a PSA configuration, the replica set will not be available during this recovery. Is this amount of time expected for this case?

            Assignee:
            backlog-server-repl [DO NOT USE] Backlog - Replication Team
            Reporter:
            jim.oleary@mongodb.com James O'Leary
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated: