We run MongoDB 3.2.9 in an OpenShift cluster using the official RedHat pod (hence the not quite up-to-date version). For reasons unknown yet, the pod failed to stop cleanly, corrupting the WiredTiger data files.
The error is basically the same as described in SERVER-23346, SERVER-27777, SERVER-25770 and possibly others. However, none of the tickets we found includes instructions on how to fix the problem ourselves.
We copied the data files to another server running MongoDB 3.4.2, hoping for fixes in the later version, but that did not help.
file:WiredTiger.wt, connection: unable to read root page from file:WiredTiger.wt: WT_ERROR: non-specific WiredTiger error
2017-03-08T09:40:48.386+0100 I CONTROL [main] ***** SERVER RESTARTED ***** 2017-03-08T09:40:48.390+0100 I CONTROL [initandlisten] MongoDB starting : pid=8437 port=27017 dbpath=/var/lib/mongodb 64-bit host=mongo 2017-03-08T09:40:48.390+0100 I CONTROL [initandlisten] db version v3.4.2 2017-03-08T09:40:48.390+0100 I CONTROL [initandlisten] git version: 3f76e40c105fc223b3e5aac3e20dcd026b83b38b 2017-03-08T09:40:48.390+0100 I CONTROL [initandlisten] OpenSSL version: OpenSSL 1.0.2g-fips 1 Mar 2016 2017-03-08T09:40:48.390+0100 I CONTROL [initandlisten] allocator: tcmalloc 2017-03-08T09:40:48.390+0100 I CONTROL [initandlisten] modules: none 2017-03-08T09:40:48.390+0100 I CONTROL [initandlisten] build environment: 2017-03-08T09:40:48.390+0100 I CONTROL [initandlisten] distmod: ubuntu1604 2017-03-08T09:40:48.390+0100 I CONTROL [initandlisten] distarch: x86_64 2017-03-08T09:40:48.390+0100 I CONTROL [initandlisten] target_arch: x86_64 2017-03-08T09:40:48.390+0100 I CONTROL [initandlisten] options: { config: "/etc/mongod.conf", net: { bindIp: "::,0.0.0.0", ipv6: true, port: 27017 }, security: { authorization: "enabled" }, storage: { dbPath: "/var/lib/mongodb", engine: "wiredTiger", journal: { enabled: true }, wiredTiger: { engineConfig: { cacheSizeGB: 0.1 } } }, systemLog: { destination: "file", logAppend: true, path: "/var/log/mongodb/mongod.log", quiet: true } } 2017-03-08T09:40:48.390+0100 W - [initandlisten] Detected unclean shutdown - /var/lib/mongodb/mongod.lock is not empty. 2017-03-08T09:40:48.408+0100 W STORAGE [initandlisten] Recovering data from the last clean checkpoint. 2017-03-08T09:40:48.408+0100 I STORAGE [initandlisten] 2017-03-08T09:40:48.408+0100 I STORAGE [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine 2017-03-08T09:40:48.408+0100 I STORAGE [initandlisten] ** See http://dochub.mongodb.org/core/prodnotes-filesystem 2017-03-08T09:40:48.408+0100 I STORAGE [initandlisten] wiredtiger_open config: create,cache_size=102M,session_max=20000,eviction=(threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0), 2017-03-08T09:40:48.415+0100 E STORAGE [initandlisten] WiredTiger error (-31802) [1488962448:415285][8437:0x7fe40cd95cc0], file:WiredTiger.wt, connection: unable to read root page from file:WiredTiger.wt: WT_ERROR: non-specific WiredTiger error 2017-03-08T09:40:48.415+0100 E STORAGE [initandlisten] WiredTiger error (0) [1488962448:415325][8437:0x7fe40cd95cc0], file:WiredTiger.wt, connection: WiredTiger has failed to open its metadata 2017-03-08T09:40:48.415+0100 E STORAGE [initandlisten] WiredTiger error (0) [1488962448:415330][8437:0x7fe40cd95cc0], file:WiredTiger.wt, connection: This may be due to the database files being encrypted, being from an older version or due to corruption on disk 2017-03-08T09:40:48.415+0100 E STORAGE [initandlisten] WiredTiger error (0) [1488962448:415334][8437:0x7fe40cd95cc0], file:WiredTiger.wt, connection: You should confirm that you have opened the database with the correct options including all encryption and compression options 2017-03-08T09:40:48.415+0100 I - [initandlisten] Assertion: 28595:-31802: WT_ERROR: non-specific WiredTiger error src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp 267 2017-03-08T09:40:48.418+0100 I STORAGE [initandlisten] exception in initAndListen: 28595 -31802: WT_ERROR: non-specific WiredTiger error, terminating 2017-03-08T09:40:48.418+0100 I NETWORK [initandlisten] shutdown: going to close listening sockets... 2017-03-08T09:40:48.418+0100 I NETWORK [initandlisten] removing socket file: /tmp/mongodb-27017.sock 2017-03-08T09:40:48.418+0100 I NETWORK [initandlisten] shutdown: going to flush diaglog... 2017-03-08T09:40:48.418+0100 I CONTROL [initandlisten] now exiting 2017-03-08T09:40:48.418+0100 I CONTROL [initandlisten] shutting down with code:100
mongod --repair results in basically the same errors.
Could you please have a look at this?
It would also be great if you could write down instructions on how to recover the WiredTiger.wt file ourselves, because we have a second database with the same issue and I fear that this will not be the last time we'll see unclean shutdowns of OpenShift pods.
Thank you!