Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-32544

Corrupt DB after Hard Crash (repeated 3 times)

    • Type: Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Priority: Icon: Critical - P2 Critical - P2
    • None
    • Affects Version/s: 3.6.1
    • Component/s: Stability, Storage
    • None
    • Fully Compatible
    • ALL
    • Hide

      This won't be easy.

      Centos 7 VM - VirtualBox 5.1.30 on OSX, 2 cores, 8GB RAM
      Use Mlaunch to start a 2 shard CSRS MDB 3.6.1 cluster with 0.5GB RAM per WT cache
      mongorestore 5GB of data (from POCDriver)

      Crashes - sometimes leaving DB corrupted.

      Database files are available.

      Show
      This won't be easy. Centos 7 VM - VirtualBox 5.1.30 on OSX, 2 cores, 8GB RAM Use Mlaunch to start a 2 shard CSRS MDB 3.6.1 cluster with 0.5GB RAM per WT cache mongorestore 5GB of data (from POCDriver) Crashes - sometimes leaving DB corrupted. Database files are available.

      MongoDB 3.6 - Centos 7

      Running in VM (Virtualbox on OSX) - several times Virtualbox has crashed with memory error. Not this is NOT the bug.

      After this crash - WT databases being written to at the time corrupted and cannot start.

      2018-01-04T16:45:57.564+0000 I CONTROL  [initandlisten] MongoDB starting : pid=1339 port=27024 dbpath=/root/data/configRepl/rs1/db 64-bit host=centos7.local
      2018-01-04T16:45:57.564+0000 I CONTROL  [initandlisten] db version v3.6.1
      2018-01-04T16:45:57.564+0000 I CONTROL  [initandlisten] git version: 025d4f4fe61efd1fb6f0005be20cb45a004093d1
      2018-01-04T16:45:57.564+0000 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.1e-fips 11 Feb 2013
      2018-01-04T16:45:57.564+0000 I CONTROL  [initandlisten] allocator: tcmalloc
      2018-01-04T16:45:57.564+0000 I CONTROL  [initandlisten] modules: none
      2018-01-04T16:45:57.564+0000 I CONTROL  [initandlisten] build environment:
      2018-01-04T16:45:57.564+0000 I CONTROL  [initandlisten]     distmod: rhel70
      2018-01-04T16:45:57.564+0000 I CONTROL  [initandlisten]     distarch: x86_64
      2018-01-04T16:45:57.564+0000 I CONTROL  [initandlisten]     target_arch: x86_64
      2018-01-04T16:45:57.564+0000 I CONTROL  [initandlisten] options: { net: { port: 27024 }, processManagement: { fork: true }, replication: { replSet: "configRepl" }, security: { keyFile: "/root/data/keyfile" }, sharding: { clusterRole: "configsvr" }, storage: { dbPath: "/root/data/configRepl/rs1/db", wiredTiger: { engineConfig: { cacheSizeGB: 0.5 } } }, systemLog: { destination: "file", path: "/root/data/configRepl/rs1/mongod.log" } }
      2018-01-04T16:45:57.565+0000 W -        [initandlisten] Detected unclean shutdown - /root/data/configRepl/rs1/db/mongod.lock is not empty.
      2018-01-04T16:45:57.566+0000 I -        [initandlisten] Detected data files in /root/data/configRepl/rs1/db created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
      2018-01-04T16:45:57.566+0000 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
      2018-01-04T16:45:57.566+0000 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=512M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),statistics_log=(wait=0),verbose=(recovery_progress),
      2018-01-04T16:45:57.950+0000 I STORAGE  [initandlisten] WiredTiger message [1515084357:950005][1339:0x7f560203bb00], txn-recover: Main recovery loop: starting at 2/42880
      2018-01-04T16:45:57.950+0000 I STORAGE  [initandlisten] WiredTiger message [1515084357:950414][1339:0x7f560203bb00], txn-recover: Recovering log 2 through 3
      2018-01-04T16:45:58.029+0000 I STORAGE  [initandlisten] WiredTiger message [1515084358:29048][1339:0x7f560203bb00], file:index-49--1899953198262174362.wt, txn-recover: Recovering log 3 through 3
      2018-01-04T16:45:58.186+0000 I STORAGE  [initandlisten] Starting WiredTigerRecordStoreThread local.oplog.rs
      2018-01-04T16:45:58.186+0000 I STORAGE  [initandlisten] The size storer reports that the oplog contains 113 records totaling to 24685 bytes
      2018-01-04T16:45:58.186+0000 I STORAGE  [initandlisten] Scanning the oplog to determine where to place markers for truncation
      2018-01-04T16:45:58.199+0000 I CONTROL  [initandlisten] ** WARNING: You are running this process as the root user, which is not recommended.
      2018-01-04T16:45:58.199+0000 I CONTROL  [initandlisten] 
      2018-01-04T16:45:58.199+0000 I CONTROL  [initandlisten] ** WARNING: This server is bound to localhost.
      2018-01-04T16:45:58.199+0000 I CONTROL  [initandlisten] **          Remote systems will be unable to connect to this server. 
      2018-01-04T16:45:58.199+0000 I CONTROL  [initandlisten] **          Start the server with --bind_ip <address> to specify which IP 
      2018-01-04T16:45:58.199+0000 I CONTROL  [initandlisten] **          addresses it should serve responses from, or with --bind_ip_all to
      2018-01-04T16:45:58.199+0000 I CONTROL  [initandlisten] **          bind to all interfaces. If this behavior is desired, start the
      2018-01-04T16:45:58.199+0000 I CONTROL  [initandlisten] **          server with --bind_ip 127.0.0.1 to disable this warning.
      2018-01-04T16:45:58.199+0000 I CONTROL  [initandlisten] 
      2018-01-04T16:45:58.200+0000 I CONTROL  [initandlisten] 
      2018-01-04T16:45:58.200+0000 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
      2018-01-04T16:45:58.200+0000 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
      2018-01-04T16:45:58.200+0000 I CONTROL  [initandlisten] 
      2018-01-04T16:45:58.200+0000 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/defrag is 'always'.
      2018-01-04T16:45:58.200+0000 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
      2018-01-04T16:45:58.200+0000 I CONTROL  [initandlisten] 
      2018-01-04T16:45:58.245+0000 I FTDC     [initandlisten] Initializing full-time diagnostic data capture with directory '/root/data/configRepl/rs1/db/diagnostic.data'
      2018-01-04T16:45:58.246+0000 I SHARDING [initandlisten] first cluster operation detected, adding sharding hook to enable versioning and authentication to remote servers
      2018-01-04T16:45:58.246+0000 I SHARDING [thread1] creating distributed lock ping thread for process ConfigServer (sleeping for 30000ms)
      2018-01-04T16:45:58.248+0000 I REPL     [initandlisten] Starting recovery oplog application at the appliedThrough: { ts: Timestamp(1515080359, 2), t: 2 }
      2018-01-04T16:45:58.248+0000 I REPL     [initandlisten] Replaying stored operations from { : Timestamp(1515080359, 2) } (exclusive) to { : Timestamp(1515082949, 2) } (inclusive).
      2018-01-04T16:45:58.249+0000 F REPL     [initandlisten] Oplog entry at { : Timestamp(1515080359, 2) } is missing; actual entry found is { : Timestamp(1515082746, 1) }
      2018-01-04T16:45:58.249+0000 F -        [initandlisten] Fatal Assertion 40292 at src/mongo/db/repl/replication_recovery.cpp 217
      2018-01-04T16:45:58.249+0000 F -        [initandlisten]
      

            Assignee:
            daniel.gottlieb@mongodb.com Daniel Gottlieb (Inactive)
            Reporter:
            john.page@mongodb.com John Page
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: