-
Type: Build Failure
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
8
-
Storage - Ra 2021-03-22, Storage - Ra 2021-04-05
Here's the BFG link I did some digging:
https://jira.mongodb.org/browse/BFG-778999
The error shows checkpoint times out on an eviction gen:
WiredTiger error error: 0 message: [1615372118:172][24168:0x7f4b7933b700], file:index-35-8537597942191832685.wt, WT_SESSION.checkpoint: __wt_gen_drain, 169: minutes < 4
By looking at the core dump, we are in the checkpoint stack and waiting on the following eviction to finish:
#0 0x00007f4b9021bfb7 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f4b9021d921 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x000055ae6cb37488 in __wt_abort (session=<optimized out>) at src/third_party/wiredtiger/src/os_common/os_abort.c:30
#3 0x000055ae6cc53e3e in __wt_gen_drain (session=0x7f4b81325f20, which=2, generation=45) at src/third_party/wiredtiger/src/support/generation.c:169
#4 0x000055ae6cc535c5 in __wt_gen_next_drain (session=0x7f4b81325f20, which=2) at src/third_party/wiredtiger/src/support/generation.c:97
#5 0x000055ae6c888941 in __wt_sync_file (session=<optimized out>, syncop=<optimized out>) at src/third_party/wiredtiger/src/btree/bt_sync.c:539
#6 0x000055ae6cd376df in __checkpoint_tree (session=0x7f4b81325f20, is_checkpoint=<optimized out>, cfg=<optimized out>) at src/third_party/wiredtiger/src/txn/txn_ckpt.c:1712
#7 0x000055ae6cd303e1 in __checkpoint_tree_helper (session=<optimized out>, cfg=<optimized out>) at src/third_party/wiredtiger/src/txn/txn_ckpt.c:1820
#8 __checkpoint_apply_to_dhandles (session=<optimized out>, cfg=<optimized out>, op=<optimized out>) at src/third_party/wiredtiger/src/txn/txn_ckpt.c:197
#9 __txn_checkpoint (session=<optimized out>, cfg=<optimized out>) at src/third_party/wiredtiger/src/txn/txn_ckpt.c:880
#10 __txn_checkpoint_wrapper (session=0x7f4b81325f20, cfg=0x7f4b79339300) at src/third_party/wiredtiger/src/txn/txn_ckpt.c:1093
#11 0x000055ae6cd2ae06 in __wt_txn_checkpoint (session=0x7f4b81325f20, cfg=0x7f4b79339300, waiting=<optimized out>) at src/third_party/wiredtiger/src/txn/txn_ckpt.c:1147
#12 0x000055ae6cc2eccb in __session_checkpoint (wt_session=0x7f4b81325f20, config=0x55ae74fe96e0 <str> "use_timestamp=true") at src/third_party/wiredtiger/src/session/session_api.c:1890
(gdb) p s
$3 = (WT_SESSION_IMPL *) 0x7f4b81324f60
While the same session is doing a history store walk when trying to insert a record to the history store in eviction:
#0 0x000055ae6c8cc5a8 in __wt_lex_compare_skip (user_item=<optimized out>, tree_item=<optimized out>, matchp=<optimized out>) at src/third_party/wiredtiger/src/include/btree_cmp_inline.h:126 #1 __wt_compare_skip (session=<optimized out>, collator=0x0, user_item=<optimized out>, tree_item=<optimized out>, cmpp=<optimized out>, matchp=<optimized out>) at src/third_party/wiredtiger/src/include/btree_cmp_inline.h:201 #2 __wt_search_insert (session=0x7f4b81324f60, cbt=0x6190001abd80, ins_head=0x60e000120d20, srch_key=0x7f4b5083e4a0) at src/third_party/wiredtiger/src/btree/row_srch.c:110 #3 0x000055ae6c7d1db9 in __cursor_skip_prev (cbt=0x6190001abd80) at src/third_party/wiredtiger/src/btree/bt_curprev.c:54 #4 0x000055ae6c7cb76b in __cursor_row_prev (cbt=<optimized out>, newpage=<optimized out>, restart=<optimized out>, skippedp=<optimized out>) at src/third_party/wiredtiger/src/btree/bt_curprev.c:509 #5 __wt_btcur_prev (cbt=0x6190001abd80, truncating=<optimized out>) at src/third_party/wiredtiger/src/btree/bt_curprev.c:656 #6 0x000055ae6c98e2c8 in __curfile_prev (cursor=<optimized out>) at src/third_party/wiredtiger/src/cursor/cur_file.c:146 #7 0x000055ae6c9b98a6 in __curhs_file_cursor_prev (session=0x7f4b81324f60, cursor=0x6190001abd80) at src/third_party/wiredtiger/src/cursor/cur_hs.c:104 #8 0x000055ae6c9ba529 in __curhs_prev_visible (session=<optimized out>, hs_cursor=<optimized out>) at src/third_party/wiredtiger/src/cursor/cur_hs.c:361 #9 0x000055ae6c9ae645 in __curhs_search_near (cursor=0x614001f56840, exactp=0x7f4b5083f140) at src/third_party/wiredtiger/src/cursor/cur_hs.c:683 #10 0x000055ae6c9a5d38 in __curhs_search_near_helper (session=<optimized out>, cursor=0x614001f56840, before=false) at src/third_party/wiredtiger/src/cursor/cur_hs.c:542 #11 0x000055ae6ca9cfa1 in __hs_insert_record (session=0x7f4b81324f60, cursor=0x614001f56840, btree=<optimized out>, key=0x604000b21290, type=3 '\003', hs_value=0x604000b20b90, tw=<optimized out>) at src/third_party/wiredtiger/src/history/hs_rec.c:156 #12 0x000055ae6ca980b2 in __wt_hs_insert_updates (session=0x7f4b81324f60, page=0x6070001a8700, multi=0xfe96a107ffc) at src/third_party/wiredtiger/src/history/hs_rec.c:604 #13 0x000055ae6cbc4f99 in __rec_hs_wrapup (session=<optimized out>, r=<optimized out>) at src/third_party/wiredtiger/src/reconcile/rec_write.c:2294 #14 __rec_write_wrapup (session=0x7f4b81324f60, r=0x61d0000b5480, page=0x6070001a8700) at src/third_party/wiredtiger/src/reconcile/rec_write.c:2120 #15 0x000055ae6cbb4a77 in __reconcile (session=<optimized out>, ref=<optimized out>, salvage=<optimized out>, flags=<optimized out>, page_lockedp=<optimized out>) at src/third_party/wiredtiger/src/reconcile/rec_write.c:219 #16 __wt_reconcile (session=0x7f4b81324f60, ref=0x60e000180e80, salvage=0x0, flags=179) at src/third_party/wiredtiger/src/reconcile/rec_write.c:99 #17 0x000055ae6ca776e3 in __evict_review (session=<optimized out>, ref=<optimized out>, evict_flags=<optimized out>, inmem_splitp=<optimized out>) at src/third_party/wiredtiger/src/evict/evict_page.c:699 #18 __wt_evict (session=0x7f4b81324f60, ref=0x60e000180e80, previous_state=3 '\003', flags=4) at src/third_party/wiredtiger/src/evict/evict_page.c:159 #19 0x000055ae6ca749e1 in __wt_page_release_evict (session=0x7f4b81324f60, ref=<optimized out>, flags=0) at src/third_party/wiredtiger/src/evict/evict_page.c:81 #20 0x000055ae6c7dc42e in __cursor_reset (cbt=<optimized out>) at src/third_party/wiredtiger/src/include/btree_inline.h:1561 #21 0x000055ae6c7e22ef in __wt_btcur_insert (cbt=0x61900022a180) at src/third_party/wiredtiger/src/btree/bt_cursor.c:879 #22 0x000055ae6c9965ce in __curfile_insert (cursor=<optimized out>) at src/third_party/wiredtiger/src/cursor/cur_file.c:266
I suspect it is stuck in the following while loop:
restart: /* * If the search stack does not point at the current item, fill it in with a search. */ recno = WT_INSERT_RECNO(cbt->ins); while ((current = cbt->ins) != PREV_INS(cbt, 0)) { if (CUR2BT(cbt)->type == BTREE_ROW) { key.data = WT_INSERT_KEY(current); key.size = WT_INSERT_KEY_SIZE(current); WT_RET(__wt_search_insert(session, cbt, cbt->ins_head, &key)); } else cbt->ins = __col_insert_search(cbt->ins_head, cbt->ins_stack, cbt->next_stack, recno); }
This starts to happen after we merge the history store restructure project (PM-1981). It is likely a fallout from it.