-
Type: Build Failure
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Schema Management
-
Storage Engines
-
5
-
2023-02-23 "Stoney Baloney", 2023-03-21 Ellen Ripley, 2023-04-04 Bibbidi-Bobbidi-Boo, 2023-05-30 - 7.0 Readiness, 2023-11-28 - Anthill Tiger, 2023-12-12 - Heisenbug, 2024-01-09 - I Grew Tired, StorEng - 2024-01-23, 2024-02-06 tapioooooooooooooca, 2024-02-20_A_near-death_puffin, 2024-03-05 - Claronald, 2024-03-19 - PacificOcean, 2024-04-02 - GreatMugshot, 나비 (nabi) - 2024-04-16, Nick - 2024-04-30
Context
The failure originates from BF-27630. The problem starts from the assert failure hit:
WT_ASSERT(session, upd->txnid == txn->id || upd->txnid == WT_TXN_ABORTED);
The assert gets hit when we are trying rollback an update on the update list within a transaction shown here:
if (S2C(session)->cache->hs_fileid != 0 && op->btree->id == S2C(session)->cache->hs_fileid) break; WT_ASSERT(session, upd->txnid == txn->id || upd->txnid == WT_TXN_ABORTED); upd->txnid = WT_TXN_ABORTED;
Looking at the BF and through GDB the upd variable shows as:
(gdb) print *session->txn->mod->u.op_row.upd $14 = WT_UPDATE: ('txnid', '14829735431805717965') ('durable_ts', '14829735431805717965') ('start_ts', '14829735431805717965') ('prev_durable_ts', '14829735431805717965') ('next', '0xcdcdcdcdcdcdcdcd') ('size', '3452816845') ('type', "205 '\\315'") ('prepare_state', "205 '\\315'") ('flags', "205 '\\315'") ('data', "0x7fa756be3bcf '\\315' <repeats 49 times>")
The update shows as 0xcdcdcdcd, this means that the memory has been freed, which means when a transaction is accessing freed memory when trying to rollback an update from the update list. There has been further investigation under an ASAN build that the eviction module has a possible play here:
] 0x6060001ac6c0 is located 0 bytes inside of 61-byte region [0x6060001ac6c0,0x6060001ac6fd) [j0:n2] freed by thread T15 here: [j0:n2] #0 0x5559cbc8bc42 in free /data/mci/4c5523d6b930f0c1f82f5452d6add3b6/toolchain-builder/tmp/build-llvm-v4.sh-FAX/llvm-project-llvmorg/compiler-rt/lib/asan/asan_malloc_linux.cpp:127:3 [j0:n2] #1 0x5559cfb41f4f in __wt_free_update_list /data/mci/fc587e29670e1d65f0277fab520f25c9/src/src/third_party/wiredtiger/src/btree/bt_discard.c:478:9 [j0:n2] #2 0x5559cfb41f4f in __free_skip_list /data/mci/fc587e29670e1d65f0277fab520f25c9/src/src/third_party/wiredtiger/src/btree/bt_discard.c:438:13 [j0:n2] #3 0x5559cfb41f4f in __free_skip_array /data/mci/fc587e29670e1d65f0277fab520f25c9/src/src/third_party/wiredtiger/src/btree/bt_discard.c:418:13 [j0:n2] #4 0x5559cfb41f4f in __free_page_modify /data/mci/fc587e29670e1d65f0277fab520f25c9/src/src/third_party/wiredtiger/src/btree/bt_discard.c:222:13 [j0:n2] #5 0x5559cfb41f4f in __wt_page_out /data/mci/fc587e29670e1d65f0277fab520f25c9/src/src/third_party/wiredtiger/src/btree/bt_discard.c:116:9 [j0:n2] #6 0x5559cfc9e7ae in __wt_evict_file /data/mci/fc587e29670e1d65f0277fab520f25c9/src/src/third_party/wiredtiger/src/evict/evict_file.c:102:13 [j0:n2] #7 0x5559cfa6d1a9 in __wt_conn_dhandle_close /data/mci/fc587e29670e1d65f0277fab520f25c9/src/src/third_party/wiredtiger/src/conn/conn_dhandle.c:434:9 [j0:n2] #8 0x5559cfbd0202 in __sweep_discard_trees /data/mci/fc587e29670e1d65f0277fab520f25c9/src/src/third_party/wiredtiger/src/conn/conn_sweep.c:170:9 [j0:n2] #9 0x5559cfbd0202 in __sweep_server /data/mci/fc587e29670e1d65f0277fab520f25c9/src/src/third_party/wiredtiger/src/conn/conn_sweep.c:400:9 [j0:n2] #10 0x7f33a11cb2dd in start_thread (/lib64/libpthread.so.0+0x82dd)
- depends on
-
WT-12210 Assertion failure: test_lsm03.py in __txn_visible_id
- Backlog
- is related to
-
SERVER-74133 Spilling to TemporaryRecordStores in multi-doc transactions does not work as expected
- Backlog
-
WT-13284 Investigate why session->drop can EBUSY with `checkpoint_wait=true,lock_wait=true`
- Closed
-
SERVER-74085 Ensure queries that spill to TemporaryRecordStores checkpoint their data
- Backlog
-
WT-10677 Fix "No such file or directory" error in test_schema_abort
- Closed
-
WT-10751 Fix test_schema_abort failure with message: session.drop: table:wt.xxxx force=false: No such file or directory
- Closed
- related to
-
SERVER-73928 Defer lifetime drop of DeferredDropRecordStore
- Closed
-
WT-11133 Correctly drop dhandles to avoid use-after-free error
- Closed
-
WT-12182 Investigate tier_storage_copy failure in schema_abort
- Closed
-
WT-12183 Test handle sweep/close on a tree with prepared updates
- Backlog
-
SERVER-74033 Remove ident force drop in favour of handling ENOENT
- Closed