Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-20597

Save data that is rolled back to the rollback directory

    • Type: Icon: Improvement Improvement
    • Resolution: Duplicate
    • Priority: Icon: Critical - P2 Critical - P2
    • None
    • Affects Version/s: 3.0.2
    • Component/s: Storage
    • None

      env:
      3-node RS on 3.0.2

      sequence:
      1) reconfig RS to have slaveDelay on both secondaries (say, 60 min)
      2) insert a doc on the primary. Here is the oplog:

      > db.oplog.rs.find().sort({$natural:1})
      { "ts" : Timestamp(1443061365, 1), "h" : NumberLong(0), "v" : 2, "op" : "n", "ns" : "", "o" : { "msg" : "initiating set" } }
      { "ts" : Timestamp(1443064118, 1), "h" : NumberLong(0), "v" : 2, "op" : "n", "ns" : "", "o" : { "msg" : "Reconfig set", "version" : 2 } }
      { "ts" : Timestamp(1443064145, 1), "h" : NumberLong("-5315938192568877039"), "v" : 2, "op" : "c", "ns" : "test.$cmd", "o" : { "create" : "test" } }
      { "ts" : Timestamp(1443064145, 2), "h" : NumberLong("-1100597721925905403"), "v" : 2, "op" : "i", "ns" : "test.test", "o" : { "_id" : ObjectId("56036951c98eee1d6f729dae"), "x" : 1 } }
      

      3) shut down all nodes, restart them as standalones, and manually alter the local.system.replset to remove slaveDelay
      4) start both former secondaries back into the replica set - they will elect a new primary, and then insert more data, so the oplog will look like this:

      { "ts" : Timestamp(1443061365, 1), "h" : NumberLong(0), "v" : 2, "op" : "n", "ns" : "", "o" : { "msg" : "initiating set" } }
      { "ts" : Timestamp(1443065179, 1), "h" : NumberLong("-5315862695633747950"), "v" : 2, "op" : "c", "ns" : "test.$cmd", "o" : { "create" : "test" } }
      { "ts" : Timestamp(1443065179, 2), "h" : NumberLong("-932390550458295111"), "v" : 2, "op" : "i", "ns" : "test.test", "o" : { "_id" : ObjectId("56036d5b8bfc46dee02d439f"), "x" : 2 } }
      { "ts" : Timestamp(1443065181, 1), "h" : NumberLong("2732461908951260525"), "v" : 2, "op" : "i", "ns" : "test.test", "o" : { "_id" : ObjectId("56036d5d8bfc46dee02d43a0"), "x" : 3 } }
      

      5) start the former primary back and observe the ROLLBACK:

      2015-09-24T11:33:46.852+0800 I REPL     [ReplicationExecutor] New replica set config in use: { _id: "replset", version: 3, members: [ { _id: 0, host: "AD-MAC10G-2.local:27050", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 1, host: "AD-MAC10G-2.local:27051", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 2, host: "AD-MAC10G-2.local:27052", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatTimeoutSecs: 10, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 } } }
      2015-09-24T11:33:46.852+0800 I REPL     [ReplicationExecutor] This node is AD-MAC10G-2.local:27050 in the config
      2015-09-24T11:33:46.853+0800 I REPL     [ReplicationExecutor] transition to STARTUP2
      2015-09-24T11:33:46.853+0800 I REPL     [ReplicationExecutor] Starting replication applier threads
      2015-09-24T11:33:46.854+0800 I REPL     [ReplicationExecutor] transition to RECOVERING
      2015-09-24T11:33:46.856+0800 I REPL     [ReplicationExecutor] transition to SECONDARY
      2015-09-24T11:33:46.857+0800 I REPL     [ReplicationExecutor] Member AD-MAC10G-2.local:27051 is now in state PRIMARY
      2015-09-24T11:33:46.857+0800 I REPL     [ReplicationExecutor] Member AD-MAC10G-2.local:27052 is now in state SECONDARY
      2015-09-24T11:33:49.866+0800 I REPL     [ReplicationExecutor] syncing from: AD-MAC10G-2.local:27052
      2015-09-24T11:33:49.870+0800 I REPL     [SyncSourceFeedback] replset setting syncSourceFeedback to AD-MAC10G-2.local:27052
      2015-09-24T11:33:49.870+0800 I REPL     [rsBackgroundSync] replSet our last op time fetched: Sep 24 11:09:05:2
      2015-09-24T11:33:49.871+0800 I REPL     [rsBackgroundSync] replset source's GTE: Sep 24 11:26:19:1
      2015-09-24T11:33:49.871+0800 I REPL     [rsBackgroundSync] beginning rollback
      2015-09-24T11:33:49.872+0800 I REPL     [rsBackgroundSync] rollback 0
      2015-09-24T11:33:49.873+0800 I REPL     [ReplicationExecutor] transition to ROLLBACK
      2015-09-24T11:33:49.876+0800 I REPL     [rsBackgroundSync] rollback 1
      2015-09-24T11:33:49.876+0800 I REPL     [rsBackgroundSync] rollback 2 FindCommonPoint
      2015-09-24T11:33:49.880+0800 I REPL     [rsBackgroundSync] replSet info rollback our last optime:   Sep 24 11:09:05:2
      2015-09-24T11:33:49.881+0800 I REPL     [rsBackgroundSync] replSet info rollback their last optime: Sep 24 11:26:21:1
      2015-09-24T11:33:49.881+0800 I REPL     [rsBackgroundSync] replSet info rollback diff in end of log times: -1036 seconds
      2015-09-24T11:33:49.882+0800 I REPL     [rsBackgroundSync] replSet rollback found matching events at Sep 24 10:22:45:1
      2015-09-24T11:33:49.882+0800 I REPL     [rsBackgroundSync] replSet rollback findcommonpoint scanned : 7
      2015-09-24T11:33:49.882+0800 I REPL     [rsBackgroundSync] replSet rollback 3 fixup
      2015-09-24T11:33:49.883+0800 I REPL     [rsBackgroundSync] rollback 3.5
      2015-09-24T11:33:49.884+0800 I REPL     [rsBackgroundSync] rollback 4 n:1
      2015-09-24T11:33:49.885+0800 I REPL     [rsBackgroundSync] replSet minvalid=Sep 24 11:26:21 56036d5d:1
      2015-09-24T11:33:49.894+0800 I REPL     [rsBackgroundSync] rollback 4.6
      2015-09-24T11:33:49.895+0800 I REPL     [rsBackgroundSync] replSet rollback drop: test.test
      2015-09-24T11:33:49.898+0800 I REPL     [rsBackgroundSync] rollback 4.7
      2015-09-24T11:33:49.898+0800 E REPL     [rsBackgroundSync] rollback cannot find object by id
      2015-09-24T11:33:49.899+0800 I REPL     [rsBackgroundSync] rollback 5 d:1 u:0
      2015-09-24T11:33:49.900+0800 I REPL     [rsBackgroundSync] rollback 6
      2015-09-24T11:33:49.900+0800 I REPL     [rsBackgroundSync] rollback done
      2015-09-24T11:33:49.901+0800 I REPL     [ReplicationExecutor] transition to RECOVERING
      2015-09-24T11:33:49.902+0800 I REPL     [rsBackgroundSync] rollback finished
      2015-09-24T11:33:49.903+0800 I REPL     [ReplicationExecutor] syncing from: AD-MAC10G-2.local:27052
      2015-09-24T11:33:49.909+0800 I REPL     [ReplicationExecutor] transition to SECONDARY
      

      However, there is no "rollback" directory in the dbpath after the rollback:

      ├── db
      │   ├── journal
      │   │   ├── j._0
      │   │   └── lsn
      │   ├── local.0
      │   ├── local.1
      │   ├── local.ns
      │   ├── mongod.lock
      │   ├── storage.bson
      │   ├── test.0
      │   └── test.ns
      └── mongod.log
      

            Assignee:
            benety.goh@mongodb.com Benety Goh
            Reporter:
            alex.komyagin@mongodb.com Alexander Komyagin (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: