Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-38089

Fatal error 40292 due to missing oplog entry at the start time stamp

    • Type: Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Priority: Icon: Critical - P2 Critical - P2
    • None
    • Affects Version/s: 4.0.1, 4.0.4
    • Component/s: Replication
    • ALL
    • Hide

      I have the various log, turtle, wt, and diagnostic data files. But I have not idea how to recreate it. 

      Am attaching all the pertinent data,  I could think of.

      • mongod-1.tgz: Secondary (with the problem)
      • mongod-0.tgz: Primary

      We are using monogdb cloud manager. The tar file contains logs of the automation agents as well.

      Since the automation agent keeps trying to restart the server, you'd find repetitive entries. It contains errors from 4.0.1 and from 4.0.4.

       

      Show
      I have the various log, turtle, wt, and diagnostic data files. But I have not idea how to recreate it.  Am attaching all the pertinent data,  I could think of. mongod-1.tgz: Secondary (with the problem) mongod-0.tgz: Primary We are using monogdb cloud manager. The tar file contains logs of the automation agents as well. Since the automation agent keeps trying to restart the server, you'd find repetitive entries. It contains errors from 4.0.1 and from 4.0.4.  

       

      A series of weird happenings have ended up in a situation triggering Fatal Error 40292 in mongo/db/repl/replication_recovery.cpp line 133.

      We have a PSA configuration. The entire issue started when we our DB approached (or hit) the disk limit. We resized the disk about mid October. From there on we've not been able to have a stable system. The servers keep going down.

      I got involved two days ago, so I don't have the entire history.

      I found that our system was triggering WT-4335, with version 4.0.1. I upgraded to 4.0.4 yesterday. I am guessing now the fatal error is triggering at the root cause. (The WT-4335 was triggering after all the oplog replay, now the error is at the start itself.)

      2018-11-12T05:41:22.612+0000 I FTDC [initandlisten] Initializing full-time diagnostic data capture with directory '/mongo/rs0_8/diagnostic.data'
      2018-11-12T05:41:22.614+0000 I REPL [initandlisten] Rollback ID is 12
      2018-11-12T05:41:22.614+0000 I REPL [initandlisten] Recovering from stable timestamp: Timestamp(1539794142, 116) (top of oplog: { ts: Timestamp(1541830881, 1), t: 85 }, appliedThrough: { ts: Timestamp(0, 0), t: -1 }, TruncateAfter: Timestamp(0, 0))
      2018-11-12T05:41:22.614+0000 I REPL [initandlisten] Starting recovery oplog application at the stable timestamp: Timestamp(1539794142, 116)
      2018-11-12T05:41:22.614+0000 I REPL [initandlisten] Replaying stored operations from { : Timestamp(1539794142, 116) } (exclusive) to { : Timestamp(1541830881, 1) } (inclusive).
      {{2018-11-12T05:41:22.617+0000 F REPL [initandlisten] Oplog entry at { : Timestamp(1539794142, 116) } is missing; actual entry found is { : Timestamp(1539794616, 312) }}}
      2018-11-12T05:41:22.617+0000 F - [initandlisten] Fatal Assertion 40292 at src/mongo/db/repl/replication_recovery.cpp 134
      {{2018-11-12T05:41:22.617+0000 F - [initandlisten] }}

      For reference, here is log with 4.0.1 (prior to the above error).

      2018-11-11T10:23:54.182+0000 I ROLLBACK [rsBackgroundSync] Rolling back to the stable timestamp. StableTimestamp: Timestamp(1539794142, 116) Initial Data Timestamp: Timestamp(1539794142, 116)2018-11-11T10:23:54.275+0000 E STORAGE [rsBackgroundSync] WiredTiger error (22) [1541931834:274955][23314:0x7f35f5bd7700], WT_CONNECTION.rollback_to_stable: rollback_to_stable illegal with active transactions: Invalid argument Raw: [1541931834:274955][23314:0x7f35f5bd7700], WT_CONNECTION.rollback_to_stable: rollback_to_stable illegal with active transactions: Invalid argument2018-11-11T10:23:54.275+0000 F ROLLBACK [rsBackgroundSync] RecoverToStableTimestamp failed. :: caused by :: UnrecoverableRollbackError: Error rolling back to stable. Err: Invalid argument

       

       

       

        1. image-2018-11-13-21-54-57-681.png
          337 kB
          Arun Lakhotia [X]
        2. mongod-0.tgz
          78.95 MB
          Arun Lakhotia [X]
        3. mongod-1.tgz
          59.13 MB
          Arun Lakhotia [X]
        4. mongodb-0.log-2018-11-14
          30.12 MB
          Arun Lakhotia [X]
        5. mongodb-1.log-2018-11-14
          6.53 MB
          Arun Lakhotia [X]
        6. node1.png
          313 kB
          Bruce Lucas

            Assignee:
            daniel.hatcher@mongodb.com Danny Hatcher (Inactive)
            Reporter:
            DrArunL Arun Lakhotia [X]
            Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

              Created:
              Updated:
              Resolved: