-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Storage Engines 2019-11-18
Chasing WT-5078, and saw a failure twice that I think is different.
Here's the failing stacks:
Sun Sep 8 16:03:17 UTC 2019: -c CONFIG.5078 iteration 4 [1567959072:302857][54295:0x7f7b2effd700], t, file:wt.wt, WT_SESSION.truncate: __wt_delete_page, 113: ref->page_del->txnid == WT_TXN_ABORTED [1567959072:302888][54295:0x7f7b2effd700], t, file:wt.wt, WT_SESSION.truncate: __wt_abort, 22: process ID 54295: waiting for debugger... [1567959072:376507][54295:0x7f7b2a7f4700], t, file:wt.wt, WT_SESSION.truncate: __wt_delete_page, 113: ref->page_del->txnid == WT_TXN_ABORTED [1567959072:376536][54295:0x7f7b2a7f4700], t, file:wt.wt, WT_SESSION.truncate: __wt_abort, 22: process ID 54295: waiting for debugger... Thread 22 (Thread 0x7f7b2a7f4700 (LWP 56739)): #0 0x00007f7b4781c03f in __GI___select (nfds=0, readfds=0x0, writefds=0x0, exceptfds=0x0, timeout=0x7f7b2a7f3a60) at ../sysdeps/unix/sysv/linux/select.c:41 #1 0x000055880fa2720e in __wt_sleep (seconds=100, micro_seconds=0) at src/os_posix/os_sleep.c:30 #2 0x000055880fa1de21 in __wt_abort (session=0x7f7b48963540) at src/os_common/os_abort.c:26 #3 0x000055880fabdeab in __wt_delete_page (session=0x7f7b48963540, ref=0x558816989ac0, skipp=0x7f7b2a7f3b51) at src/btree/bt_delete.c:113 #4 0x000055880fafe208 in __tree_walk_internal (session=0x7f7b48963540, refp=0x7f7a880d0508, walkcntp=0x0, skip_func=0x0, func_cookie=0x0, flags=6274) at src/btree/bt_walk.c:486 #5 0x000055880fafe568 in __wt_tree_walk (session=0x7f7b48963540, refp=0x7f7a880d0508, flags=6272) at src/btree/bt_walk.c:572 #6 0x000055880fba001a in __wt_btcur_next (cbt=0x7f7a880d0390, truncating=true) at src/btree/bt_curnext.c:689 #7 0x000055880fabb3c5 in __cursor_truncate (session=0x7f7b48963540, start=0x7f7a880d0390, stop=0x7f7a895d6410, rmfunc=0x55880fab7880 <__cursor_row_modify>) at src/btree/bt_cursor.c:1676 #8 0x000055880fabb73d in __wt_btcur_range_truncate (start=0x7f7a880d0390, stop=0x7f7a895d6410) at src/btree/bt_cursor.c:1795 #9 0x000055880fb956dd in __wt_schema_range_truncate (session=0x7f7b48963540, start=0x7f7a880d0390, stop=0x7f7a895d6410) at src/schema/schema_truncate.c:141 #10 0x000055880fa59a42 in __wt_session_range_truncate (session=0x7f7b48963540, uri=0x0, start=0x7f7a880d0390, stop=0x7f7a895d6410) at src/session/session_api.c:1417 #11 0x000055880fa5a415 in __session_truncate (wt_session=0x7f7b48963540, uri=0x0, start=0x7f7a880d0390, stop=0x7f7a895d6410, config=0x0) at src/session/session_api.c:1488 #12 0x000055880f9b6428 in row_truncate (tinfo=0x55881ab231a0, cursor=0x7f7a880d0390) at ops.c:1443 #13 0x000055880f9b4376 in ops (arg=0x55881ab231a0) at ops.c:879 #14 0x00007f7b481556db in start_thread (arg=0x7f7b2a7f4700) at pthread_create.c:463 #15 0x00007f7b4782688f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Thread 13 (Thread 0x7f7b2effd700 (LWP 56730)): #0 0x00007f7b4781c03f in __GI___select (nfds=0, readfds=0x0, writefds=0x0, exceptfds=0x0, timeout=0x7f7b2effca60) at ../sysdeps/unix/sysv/linux/select.c:41 #1 0x000055880fa2720e in __wt_sleep (seconds=100, micro_seconds=0) at src/os_posix/os_sleep.c:30 #2 0x000055880fa1de21 in __wt_abort (session=0x7f7b48968040) at src/os_common/os_abort.c:26 #3 0x000055880fabdeab in __wt_delete_page (session=0x7f7b48968040, ref=0x558817229730, skipp=0x7f7b2effcb51) at src/btree/bt_delete.c:113 #4 0x000055880fafe208 in __tree_walk_internal (session=0x7f7b48968040, refp=0x7f7ab04086b8, walkcntp=0x0, skip_func=0x0, func_cookie=0x0, flags=6274) at src/btree/bt_walk.c:486 #5 0x000055880fafe568 in __wt_tree_walk (session=0x7f7b48968040, refp=0x7f7ab04086b8, flags=6272) at src/btree/bt_walk.c:572 #6 0x000055880fba001a in __wt_btcur_next (cbt=0x7f7ab0408540, truncating=true) at src/btree/bt_curnext.c:689 #7 0x000055880fabb3c5 in __cursor_truncate (session=0x7f7b48968040, start=0x7f7ab0408540, stop=0x7f7ab015ba50, rmfunc=0x55880fab7880 <__cursor_row_modify>) at src/btree/bt_cursor.c:1676 #8 0x000055880fabb73d in __wt_btcur_range_truncate (start=0x7f7ab0408540, stop=0x7f7ab015ba50) at src/btree/bt_cursor.c:1795 #9 0x000055880fb956dd in __wt_schema_range_truncate (session=0x7f7b48968040, start=0x7f7ab0408540, stop=0x7f7ab015ba50) at src/schema/schema_truncate.c:141 #10 0x000055880fa59a42 in __wt_session_range_truncate (session=0x7f7b48968040, uri=0x0, start=0x7f7ab0408540, stop=0x7f7ab015ba50) at src/session/session_api.c:1417 #11 0x000055880fa5a415 in __session_truncate (wt_session=0x7f7b48968040, uri=0x0, start=0x7f7ab0408540, stop=0x7f7ab015ba50, config=0x0) at src/session/session_api.c:1488 #12 0x000055880f9b6428 in row_truncate (tinfo=0x558813d42fd0, cursor=0x7f7ab0408540) at ops.c:1443 #13 0x000055880f9b4376 in ops (arg=0x558813d42fd0) at ops.c:879 #14 0x00007f7b481556db in start_thread (arg=0x7f7b2effd700) at pthread_create.c:463 #15 0x00007f7b4782688f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
I think the two stacks occurred independently, I don't see any relationship between them, they're different WT_REFs, truncating ranges that are disjoint. They are both, however, failing in the aftermath of txnID=18446744073709551615, that is, they're seeing problems with objects deleted by the same transaction.
(gdb) thread 13 [Switching to thread 13 (Thread 0x7f7b2effd700 (LWP 56730))] #0 0x00007f7b4781c03f in __GI___select (nfds=0, readfds=0x0, writefds=0x0, exceptfds=0x0, timeout=0x7f7b2effca60) at ../sysdeps/unix/sysv/linux/select.c:41 41 in ../sysdeps/unix/sysv/linux/select.c (gdb) frame 3 #3 0x000055880fabdeab in __wt_delete_page (session=0x7f7b48968040, ref=0x558817229730, skipp=0x7f7b2effcb51) at src/btree/bt_delete.c:113 113 WT_ASSERT(session, ref->page_del->txnid == WT_TXN_ABORTED); (gdb) p *ref $14 = { page = 0x0, home = 0x558812c2b390, pindex_hint = 4, state = 3, addr = 0x5588172298a5, key = { recno = 64424509741, ikey = 0xf0000012d }, page_del = 0x7f7af80acb90, page_las = 0x0, hist = {{ session = 0x7f7b48968040, name = 0x55880fbc50dc "WT_SESSION.truncate", func = 0x55880fbd4290 <__func__.36683> "__wt_delete_page", time_sec = 1567959072, line = 101, state = 3 }, { session = 0x7f7b48961dd0, name = 0x55880fbc50dc "WT_SESSION.truncate", func = 0x55880fbd4390 <__func__.36703> "__wt_delete_page_rollback", time_sec = 1567959072, line = 189, state = 0 }, { session = 0x7f7b48961dd0, name = 0x55880fbc50dc "WT_SESSION.truncate", func = 0x55880fbd4290 <__func__.36683> "__wt_delete_page", time_sec = 1567959072, line = 155, state = 1 }}, histoff = 2 } (gdb) p *ref.page_del $15 = { txnid = 18446744073709551615, timestamp = 0, durable_timestamp = 0, prepare_state = 0 '\000', previous_state = 0, update_list = 0x0 } (gdb) thread 22 [Switching to thread 22 (Thread 0x7f7b2a7f4700 (LWP 56739))] #0 0x00007f7b4781c03f in __GI___select (nfds=0, readfds=0x0, writefds=0x0, exceptfds=0x0, timeout=0x7f7b2a7f3a60) at ../sysdeps/unix/sysv/linux/select.c:41 41 in ../sysdeps/unix/sysv/linux/select.c (gdb) frame 3 #3 0x000055880fabdeab in __wt_delete_page (session=0x7f7b48963540, ref=0x558816989ac0, skipp=0x7f7b2a7f3b51) at src/btree/bt_delete.c:113 113 WT_ASSERT(session, ref->page_del->txnid == WT_TXN_ABORTED); (gdb) p *ref $16 = { page = 0x0, home = 0x558818098770, pindex_hint = 18, state = 3, addr = 0x55881800866d, key = { recno = 81604379829, ikey = 0x13000004b5 }, page_del = 0x7f7acc0e96c0, page_las = 0x0, hist = {{ session = 0x7f7b48963540, name = 0x55880fbc50dc "WT_SESSION.truncate", func = 0x55880fbd4290 <__func__.36683> "__wt_delete_page", time_sec = 1567959072, line = 101, state = 3 }, { session = 0x7f7b48965f70, name = 0x55880fbc50dc "WT_SESSION.truncate", func = 0x55880fbd4390 <__func__.36703> "__wt_delete_page_rollback", time_sec = 1567959072, line = 189, state = 0 }, { session = 0x7f7b48965f70, name = 0x55880fbc50dc "WT_SESSION.truncate", func = 0x55880fbd4290 <__func__.36683> "__wt_delete_page", time_sec = 1567959072, line = 155, state = 1 }}, histoff = 2 } (gdb) p *ref.page_del $17 = { txnid = 18446744073709551615, timestamp = 0, durable_timestamp = 0, prepare_state = 0 '\000', previous_state = 0, update_list = 0x0 }
Here's the CONFIG (copied from WT-5078).
############################################ # RUN PARAMETERS ############################################ abort=0 alter=0 assert_commit_timestamp=1 assert_read_timestamp=0 auto_throttle=1 backups=1 bitcnt=7 bloom=1 bloom_bit_count=5 bloom_hash_count=28 bloom_oldest=0 cache=620 cache_minimum=20 checkpoints=wiredtiger checkpoint_log_size=71 checkpoint_wait=59 checksum=on chunk_size=10 compaction=0 compression=none data_extend=0 data_source=table delete_pct=33 dictionary=0 direct_io=0 encryption=none evict_max=2 file_type=row-store firstfit=0 huffman_key=0 huffman_value=0 independent_thread_rng=0 in_memory=0 insert_pct=25 internal_key_truncation=1 internal_page_max=12 isolation=snapshot key_gap=5 key_max=28 key_min=14 leaf_page_max=10 leak_memory=0 logging=0 logging_archive=0 logging_compression=none logging_file_max=436785 logging_prealloc=1 long_running_txn=0 lsm_worker_threads=3 memory_page_max=10 merge_max=15 mmap=1 modify_pct=22 ops=0 prefix_compression=0 prefix_compression_min=1 prepare=0 quiet=1 read_pct=16 rebalance=1 repeat_data_pct=27 reverse=1 rows=1000000 runs=1 salvage=1 split_pct=98 statistics=0 statistics_server=0 threads=31 timer=4 timing_stress_aggressive_sweep=0 timing_stress_checkpoint=0 timing_stress_lookaside_sweep=0 timing_stress_split_1=0 timing_stress_split_2=0 timing_stress_split_3=0 timing_stress_split_4=0 timing_stress_split_5=0 timing_stress_split_6=0 timing_stress_split_7=0 timing_stress_split_8=0 transaction_timestamps=1 transaction-frequency=100 truncate=1 value_max=883 value_min=2 verify=1 wiredtiger_config= write_pct=4 ############################################
- duplicates
-
WT-5174 Truncation rollback locking is insufficient
- Closed