-
Type: Bug
-
Resolution: Done
-
Priority: Critical - P2
-
None
-
Affects Version/s: 3.4.6
-
Component/s: Index Maintenance
-
None
-
ALL
When creating a background index in our cluster with 7 shards (600 mi documents) and in one collection sharded by a hased index, the server continuously crashes.
We created this index:
2017-09-14T20:42:29.543+0000 I INDEX [initandlisten] found 1 interrupted index build(s) on shipyard.investigation_cards 2017-09-14T20:42:29.543+0000 I INDEX [initandlisten] note: restart the server with --noIndexBuildRetry to skip index rebuilds 2017-09-14T20:42:29.545+0000 I INDEX [initandlisten] build index on: shipyard.investigation_cards properties: { v: 2, key: { account_id: 1, universe_id: 1, stilingue_array.call_id: 1, stilingue_array.page_id: 1, normalized_posted_at: 1 }, name: "sac_call_id", ns: "shipyard.investigation_cards", background: true } 2017-09-14T20:42:29.545+0000 I INDEX [initandlisten] building index using bulk method; build may temporarily use up to 500 megabytes of RAM
After some time building the MongoDB crashed with this error:
2017-09-14T20:43:58.517+0000 E STORAGE [initandlisten] WiredTiger error (0) [1505421838:517507][852475:0x7f9e3c4b2d40], file:collection-22-3497018620930100997.wt, WT_CURSOR.next: read checksum error for 8192B block at offset 72198791168: block header checksum of 0 doesn't match expected checksum of 707510254
2017-09-14T20:43:58.517+0000 E STORAGE [initandlisten] WiredTiger error (0) [1505421838:517551][852475:0x7f9e3c4b2d40], file:collection-22-3497018620930100997.wt, WT_CURSOR.next: collection-22-3497018620930100997.wt: encountered an illegal file format or internal value
2017-09-14T20:43:58.517+0000 E STORAGE [initandlisten] WiredTiger error (-31804) [1505421838:517558][852475:0x7f9e3c4b2d40], file:collection-22-3497018620930100997.wt, WT_CURSOR.next: the process must exit and restart: WT_PANIC: WiredTiger library panic
2017-09-14T20:43:58.517+0000 I - [initandlisten] Fatal Assertion 28558 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 361
2017-09-14T20:43:58.517+0000 I - [initandlisten]
I wil attach two log files. First one is the first crash (right after the index build start) and the second one is a subsequent crash.
If you guys needs more data I will need a secure portal to upload my data, because we have big files here. Unfortunately I can't upload any data files from this collection for security reasons.
When I started the server with the option --noIndexBuildRetry, it stops the crashes. I will make initial sync in those two servers because I'm not confident if this did not corrupted any data or index in my database.