Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-28242

WiredTiger.wt file corrupted after unclean shutdown, cannot recover

    • Linux

      We run MongoDB 3.2.9 in an OpenShift cluster using the official RedHat pod (hence the not quite up-to-date version). For reasons unknown yet, the pod failed to stop cleanly, corrupting the WiredTiger data files.

      The error is basically the same as described in SERVER-23346, SERVER-27777, SERVER-25770 and possibly others. However, none of the tickets we found includes instructions on how to fix the problem ourselves.

      We copied the data files to another server running MongoDB 3.4.2, hoping for fixes in the later version, but that did not help.

      file:WiredTiger.wt, connection: unable to read root page from file:WiredTiger.wt: WT_ERROR: non-specific WiredTiger error

      2017-03-08T09:40:48.386+0100 I CONTROL  [main] ***** SERVER RESTARTED *****
      2017-03-08T09:40:48.390+0100 I CONTROL  [initandlisten] MongoDB starting : pid=8437 port=27017 dbpath=/var/lib/mongodb 64-bit host=mongo
      2017-03-08T09:40:48.390+0100 I CONTROL  [initandlisten] db version v3.4.2
      2017-03-08T09:40:48.390+0100 I CONTROL  [initandlisten] git version: 3f76e40c105fc223b3e5aac3e20dcd026b83b38b
      2017-03-08T09:40:48.390+0100 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.2g-fips  1 Mar 2016
      2017-03-08T09:40:48.390+0100 I CONTROL  [initandlisten] allocator: tcmalloc
      2017-03-08T09:40:48.390+0100 I CONTROL  [initandlisten] modules: none
      2017-03-08T09:40:48.390+0100 I CONTROL  [initandlisten] build environment:
      2017-03-08T09:40:48.390+0100 I CONTROL  [initandlisten]     distmod: ubuntu1604
      2017-03-08T09:40:48.390+0100 I CONTROL  [initandlisten]     distarch: x86_64
      2017-03-08T09:40:48.390+0100 I CONTROL  [initandlisten]     target_arch: x86_64
      2017-03-08T09:40:48.390+0100 I CONTROL  [initandlisten] options: { config: "/etc/mongod.conf", net: { bindIp: "::,0.0.0.0", ipv6: true, port: 27017 }, security: { authorization: "enabled" }, storage: { dbPath: "/var/lib/mongodb", engine: "wiredTiger", journal: { enabled: true }, wiredTiger: { engineConfig: { cacheSizeGB: 0.1 } } }, systemLog: { destination: "file", logAppend: true, path: "/var/log/mongodb/mongod.log", quiet: true } }
      2017-03-08T09:40:48.390+0100 W -        [initandlisten] Detected unclean shutdown - /var/lib/mongodb/mongod.lock is not empty.
      2017-03-08T09:40:48.408+0100 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
      2017-03-08T09:40:48.408+0100 I STORAGE  [initandlisten] 
      2017-03-08T09:40:48.408+0100 I STORAGE  [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine
      2017-03-08T09:40:48.408+0100 I STORAGE  [initandlisten] **          See http://dochub.mongodb.org/core/prodnotes-filesystem
      2017-03-08T09:40:48.408+0100 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=102M,session_max=20000,eviction=(threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
      2017-03-08T09:40:48.415+0100 E STORAGE  [initandlisten] WiredTiger error (-31802) [1488962448:415285][8437:0x7fe40cd95cc0], file:WiredTiger.wt, connection: unable to read root page from file:WiredTiger.wt: WT_ERROR: non-specific WiredTiger error
      2017-03-08T09:40:48.415+0100 E STORAGE  [initandlisten] WiredTiger error (0) [1488962448:415325][8437:0x7fe40cd95cc0], file:WiredTiger.wt, connection: WiredTiger has failed to open its metadata
      2017-03-08T09:40:48.415+0100 E STORAGE  [initandlisten] WiredTiger error (0) [1488962448:415330][8437:0x7fe40cd95cc0], file:WiredTiger.wt, connection: This may be due to the database files being encrypted, being from an older version or due to corruption on disk
      2017-03-08T09:40:48.415+0100 E STORAGE  [initandlisten] WiredTiger error (0) [1488962448:415334][8437:0x7fe40cd95cc0], file:WiredTiger.wt, connection: You should confirm that you have opened the database with the correct options including all encryption and compression options
      2017-03-08T09:40:48.415+0100 I -        [initandlisten] Assertion: 28595:-31802: WT_ERROR: non-specific WiredTiger error src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp 267
      2017-03-08T09:40:48.418+0100 I STORAGE  [initandlisten] exception in initAndListen: 28595 -31802: WT_ERROR: non-specific WiredTiger error, terminating
      2017-03-08T09:40:48.418+0100 I NETWORK  [initandlisten] shutdown: going to close listening sockets...
      2017-03-08T09:40:48.418+0100 I NETWORK  [initandlisten] removing socket file: /tmp/mongodb-27017.sock
      2017-03-08T09:40:48.418+0100 I NETWORK  [initandlisten] shutdown: going to flush diaglog...
      2017-03-08T09:40:48.418+0100 I CONTROL  [initandlisten] now exiting
      2017-03-08T09:40:48.418+0100 I CONTROL  [initandlisten] shutting down with code:100
      

      mongod --repair results in basically the same errors.

      Could you please have a look at this?

      It would also be great if you could write down instructions on how to recover the WiredTiger.wt file ourselves, because we have a second database with the same issue and I fear that this will not be the last time we'll see unclean shutdowns of OpenShift pods.

      Thank you!

        1. collection-174--8777641835294838235.wt
          36 kB
          Tobias Brunner
        2. db2.tgz
          33 kB
          Manuel Hutter
        3. repair_attempt.tar.gz
          38 kB
          Kelsey Schubert
        4. repair_attempt-2.tar.gz
          33 kB
          Kelsey Schubert
        5. WiredTiger.turtle
          0.9 kB
          David Gubler
        6. WiredTiger.wt
          740 kB
          David Gubler

            Assignee:
            kelsey.schubert@mongodb.com Kelsey Schubert
            Reporter:
            david.gubler David Gubler
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: