-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: 3.4.2
-
Component/s: WiredTiger
-
Linux
We have a development DB, 3 member replica set running Mongo DB server 3.4.2 on Amazon Linux (EC2 t2.medium), recently upgraded from 3.2.x.
rs.status() returned all members were healthy.
We restarted each of the 3 members to enable a configuration change for logRotate in the following order:
secondary 3 (hidden:true, priority:0, port:29019),
secondary 2 (priority:20, port:29018),
primary (priority 30, port:29017)
They were each restarted using: service mongod restart with a short pause between each one.
This database had no clients connected at the time.
After restarting all 3, in mongo client we issued rs.status() and noticed "primary" and "secondary 2" were healthy but "secondary 3" was not healthy, and could not be contacted. It can also no longer be started.
Looking at "secondary 3" the first error is following:
2017-03-01T15:55:00.534+0000 E STORAGE [repl writer worker 8] WiredTiger error (0) [1488383700:534159][9280:0x7f8edc3d3700], file:index-14-4040745961588520825.wt, WT_SESSION.open_cursor: read checksum error for 4096B block at offset 45056: block header checksum of 3605474371 doesn't match expected checksum of 3806882032 2017-03-01T15:55:00.534+0000 E STORAGE [repl writer worker 8] WiredTiger error (0) [1488383700:534218][9280:0x7f8edc3d3700], file:index-14-4040745961588520825.wt, WT_SESSION.open_cursor: index-14-4040745961588520825.wt: encountered an illegal file format or internal value 2017-03-01T15:55:00.534+0000 E STORAGE [repl writer worker 8] WiredTiger error (-31804) [1488383700:534226][9280:0x7f8edc3d3700], file:index-14-4040745961588520825.wt, WT_SESSION.open_cursor: the process must exit and restart: WT_PANIC: WiredTiger library panic 2017-03-01T15:55:00.534+0000 I - [repl writer worker 8] Fatal Assertion 28558 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 361 2017-03-01T15:55:00.534+0000 I - [repl writer worker 8] ***aborting after fassert() failure 2017-03-01T15:55:00.570+0000 F - [repl writer worker 8] Got signal: 6 (Aborted).
Now immediately on restart we get logged:
2017-03-01T16:39:22.408+0000 E STORAGE [repl writer worker 7] WiredTiger error (0) [1488386362:408202][9783:0x7fde825da700], file:index-14-4040745961588520825.wt, WT_SESSION.open_cursor: read checksum error for 4096B block at offset 45056: block header checksum of 3605474371 doesn't match expected checksum of 3806882032 2017-03-01T16:39:22.408+0000 E STORAGE [repl writer worker 7] WiredTiger error (0) [1488386362:408228][9783:0x7fde825da700], file:index-14-4040745961588520825.wt, WT_SESSION.open_cursor: index-14-4040745961588520825.wt: encountered an illegal file format or internal value 2017-03-01T16:39:22.408+0000 E STORAGE [repl writer worker 7] WiredTiger error (-31804) [1488386362:408245][9783:0x7fde825da700], file:index-14-4040745961588520825.wt, WT_SESSION.open_cursor: the process must exit and restart: WT_PANIC: WiredTiger library panic 2017-03-01T16:39:22.408+0000 I - [repl writer worker 7] Fatal Assertion 28558 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 361 2017-03-01T16:39:22.408+0000 I - [repl writer worker 7] ***aborting after fassert() failure 2017-03-01T16:39:22.408+0000 I - [repl writer worker 15] Fatal Assertion 28559 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 64 2017-03-01T16:39:22.408+0000 I - [repl writer worker 15] ***aborting after fassert() failure