-
Type: Bug
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
Running a patch build for WT-7519, I hit a failure as described below.
After puzzling over it a bit I can't see how it could be related to that change. So I'm opening a new issue.
We fail running an ASAN build of format on the following config:
############################################ # RUN PARAMETERS: V2 ############################################ assert.commit_timestamp=0 assert.read_timestamp=0 backup=1 backup.incremental=block backup.incr_granularity=16384 btree.bitcnt=6 btree.compression=zlib btree.dictionary=0 btree.huffman_value=0 btree.internal_key_truncation=1 btree.internal_page_max=15 btree.key_gap=2 btree.key_max=56 btree.key_min=21 btree.leaf_page_max=13 btree.memory_page_max=6 btree.prefix_compression=1 btree.prefix_compression_min=1 btree.repeat_data_pct=7 btree.reverse=0 btree.split_pct=70 btree.value_max=709 btree.value_min=20 cache=84 cache.evict_max=3 cache.minimum=20 checkpoint=on checkpoint.log_size=119 checkpoint.wait=54 disk.checksum=uncompressed disk.data_extend=0 disk.direct_io=0 disk.encryption=none disk.firstfit=0 disk.mmap=1 disk.mmap_all=0 format.abort=0 format.independent_thread_rng=1 format.major_timeout=0 import=0 logging=1 logging.archive=0 logging.compression=none logging.file_max=24619 logging.prealloc=1 lsm.auto_throttle=1 lsm.bloom=1 lsm.bloom_bit_count=33 lsm.bloom_hash_count=31 lsm.bloom_oldest=0 lsm.chunk_size=2 lsm.merge_max=5 lsm.worker_threads=3 ops.alter=0 ops.compaction=0 ops.hs_cursor=1 ops.pct.delete=0 ops.pct.insert=1 ops.pct.modify=12 ops.pct.read=0 ops.pct.write=87 ops.prepare=0 ops.random_cursor=0 ops.salvage=0 ops.truncate=1 ops.verify=1 quiet=1 runs=1 runs.in_memory=0 runs.ops=0 runs.rows=100000 runs.source=table runs.threads=6 runs.timer=4 runs.type=row-store runs.verify_failure_dump=0 statistics=0 statistics.server=0 stress.aggressive_sweep=0 stress.checkpoint=0 stress.checkpoint_prepare=0 stress.hs_checkpoint_delay=0 stress.hs_search=1 stress.hs_sweep=0 stress.split_1=0 stress.split_2=0 stress.split_3=0 stress.split_4=0 stress.split_5=0 stress.split_6=0 stress.split_7=0 stress.split_8=0 transaction.frequency=100 transaction.isolation=snapshot transaction.rollback_to_stable=0 transaction.timestamps=1 wiredtiger.config= wiredtiger.rwlock=1 wiredtiger.leak_memory=0 ############################################
It's a cache stuck error after a bunch of transactions have been rolled back for eviction:
[2021/05/12 02:08:58.912] format.sh: job in /data/mci/71d478ee5cba875843c23ea5b22ebad8/wiredtiger/test/format/RUNDIR.1 failed [2021/05/12 02:08:58.913] t: process 11942 running [2021/05/12 02:08:58.913] 0x7f8c2ac8d7a0:oldest pinned transaction ID rolled back for eviction [2021/05/12 02:08:58.913] 0x7f8c2ac8db98:oldest pinned transaction ID rolled back for eviction [2021/05/12 02:08:58.913] 0x7f8c2ac8e388:oldest pinned transaction ID rolled back for eviction [2021/05/12 02:08:58.913] 0x7f8c2ac8e780:oldest pinned transaction ID rolled back for eviction [2021/05/12 02:08:58.913] 0x7f8c2ac8eb78:oldest pinned transaction ID rolled back for eviction [2021/05/12 02:08:58.913] 0x7f8c2ac8d7a0:oldest pinned transaction ID rolled back for eviction [2021/05/12 02:08:58.913] 0x7f8c2ac8df90:oldest pinned transaction ID rolled back for eviction [2021/05/12 02:08:58.913] 0x7f8c2ac8e780:oldest pinned transaction ID rolled back for eviction [2021/05/12 02:08:58.913] 0x7f8c2ac8eb78:oldest pinned transaction ID rolled back for eviction [2021/05/12 02:08:58.913] 0x7f8c2ac8db98:oldest pinned transaction ID rolled back for eviction [2021/05/12 02:08:58.913] 0x7f8c2ac8eb78:oldest pinned transaction ID rolled back for eviction [2021/05/12 02:08:58.913] 0x7f8c2ac8db98:oldest pinned transaction ID rolled back for eviction [2021/05/12 02:08:58.913] 0x7f8c2ac8e388:oldest pinned transaction ID rolled back for eviction [2021/05/12 02:08:58.913] 0x7f8c2ac8df90:oldest pinned transaction ID rolled back for eviction [2021/05/12 02:08:58.913] 0x7f8c2ac8e388:oldest pinned transaction ID rolled back for eviction [2021/05/12 02:08:58.913] 0x7f8c2ac8e388:oldest pinned transaction ID rolled back for eviction [2021/05/12 02:08:58.913] 0x7f8c2ac8eb78:oldest pinned transaction ID rolled back for eviction [2021/05/12 02:08:58.913] 0x7f8c2ac8db98:oldest pinned transaction ID rolled back for eviction [2021/05/12 02:08:58.913] 0x7f8c2ac8e780:oldest pinned transaction ID rolled back for eviction [2021/05/12 02:08:58.913] 0x7f8c2ac8e388:oldest pinned transaction ID rolled back for eviction [2021/05/12 02:08:58.913] 0x7f8c2ac8e780:oldest pinned transaction ID rolled back for eviction [2021/05/12 02:08:58.913] 0x7f8c2ac8eb78:oldest pinned transaction ID rolled back for eviction [2021/05/12 02:08:58.913] 0x7f8c2ac8e780:oldest pinned transaction ID rolled back for eviction [2021/05/12 02:08:58.913] 0x7f8c2ac8db98:oldest pinned transaction ID rolled back for eviction [2021/05/12 02:08:58.913] 0x7f8c2ac8db98:oldest pinned transaction ID rolled back for eviction [2021/05/12 02:08:58.913] 0x7f8c2ac8e780:oldest pinned transaction ID rolled back for eviction [2021/05/12 02:08:58.913] 0x7f8c2ac8db98:oldest pinned transaction ID rolled back for eviction [2021/05/12 02:08:58.914] 0x7f8c2ac8df90:oldest pinned transaction ID rolled back for eviction [2021/05/12 02:08:58.914] 0x7f8c2ac8e388:oldest pinned transaction ID rolled back for eviction [2021/05/12 02:08:58.914] 0x7f8c2ac8eb78:oldest pinned transaction ID rolled back for eviction [2021/05/12 02:08:58.914] 0x7f8c2ac8d7a0:oldest pinned transaction ID rolled back for eviction [2021/05/12 02:08:58.914] 0x7f8c2ac8d7a0:oldest pinned transaction ID rolled back for eviction [2021/05/12 02:08:58.914] [1620785336:267393][11942:0x7f8c1e51b700], eviction-server: __evict_server, 452: Cache stuck for too long, giving up: Connection timed out [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:transaction state dump [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:current ID: 17132 [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:last running ID: 17132 [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:metadata_pinned ID: 9723 [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:oldest ID: 17132 [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:durable timestamp: (0, 23606) [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:oldest timestamp: (0, 23596) [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:pinned timestamp: (0, 23596) [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:stable timestamp: (0, 23606) [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:has_durable_timestamp: yes [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:has_oldest_timestamp: yes [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:has_pinned_timestamp: yes [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:has_stable_timestamp: yes [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:oldest_is_pinned: yes [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:stable_is_pinned: no [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:checkpoint running: yes [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:checkpoint generation: 3 [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:checkpoint pinned ID: 9693 [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:checkpoint txn ID: 9723 [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:session count: 22 [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:Transaction state of active sessions: [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:ID: 0, pinned ID: 17132, metadata pinned ID: 17132, name: WT_CURSOR.next [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:transaction id: 9723, mod count: 0, snap min: 9693, snap max: 9723, snapshot count: 6, commit_timestamp: (0, 0), durable_timestamp: (0, 0), first_commit_timestamp: (0, 0), prepare_timestamp: (0, 0), pinned_durable_timestamp: (0, 0), read_timestamp: (0, 0), checkpoint LSN: [2][2420992], full checkpoint: true, rollback reason: , flags: 0x0000040c, isolation: WT_ISO_READ_UNCOMMITTED [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:cache dump [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:cache full: no [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:cache clean check: no (24.562%) [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:cache dirty check: yes (22.684%) [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:cache updates check: no (9.964%) [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:file:wt.wt(<live>): [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:internal: 1 pages, 0MB, 0/1 clean/dirty pages, 0/0 clean/dirty MB, 0MB max page, 0MB max dirty page [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:leaf: 1440 pages, 16MB, 1/1439 clean/dirty pages, 0/16/7 clean/dirty/updates MB, 0MB max page, 0MB max dirty page [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:file:WiredTigerHS.wt(<live>): [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:internal: 1 pages, 0MB, 0/1 clean/dirty pages, 0/0 clean/dirty MB, 0MB max page, 0MB max dirty page [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:leaf: 24 pages, 1MB, 15/9 clean/dirty pages, 1/0/0 clean/dirty/updates MB, 0MB max page, 0MB max dirty page [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:file:WiredTiger.wt(<live>): [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:internal: 1 pages, 0MB, 0/1 clean/dirty pages, 0/0 clean/dirty MB, 0MB max page, 0MB max dirty page [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:leaf: 0 pages [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:cache dump: total found: 20MB vs tracked inuse 19MB [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:total dirty bytes: 18MB vs tracked dirty 18MB [2021/05/12 02:08:58.914] 0x7f8c2ac8bfd0:total updates bytes: 7MB vs tracked updates 8MB [2021/05/12 02:08:58.914] [1620785336:269292][11942:0x7f8c1e51b700], eviction-server: __wt_evict_thread_run, 324: cache eviction thread error: Connection timed out [2021/05/12 02:08:58.914] [1620785336:269311][11942:0x7f8c1e51b700], eviction-server: __wt_evict_thread_run, 324: the process must exit and restart: WT_PANIC: WiredTiger library panic [2021/05/12 02:08:58.914] [1620785336:269323][11942:0x7f8c1e51b700], eviction-server: __wt_abort, 28: aborting WiredTiger library
Nothing jumps out at me in the core dump. Other than the aborted eviction thread, there are only two other active threads. One is performing a checkpoint and appears to be actively running at the time of the failure:
Thread 6 (Thread 0x7f8c17bcd700 (LWP 11985)): #0 0x000000000063f858 in __curfile_next (cursor=0x619000ba1d80) at ../src/cursor/cur_file.c:90 #1 0x00000000006721e0 in __curhs_file_cursor_next (session=0x7f8c2ac8fb58, cursor=0x619000ba1d80) at ../src/cursor/cur_hs.c:91 #2 0x0000000000666232 in __curhs_search_near (cursor=0x6140000f4c40, exactp=0x7f8c17bc8140) at ../src/cursor/cur_hs.c:708 #3 0x000000000065b0c7 in __curhs_search_near_helper (session=0x7f8c2ac8fb58, cursor=0x6140000f4c40, before=false) at ../src/cursor/cur_hs.c:556 #4 0x000000000065bdff in __wt_curhs_search_near_after (session=0x7f8c2ac8fb58, cursor=0x6140000f4c40) at ../src/cursor/cur_hs.c:540 #5 0x0000000000d35d15 in __hs_insert_record (session=0x7f8c2ac8fb58, cursor=0x6140000f4c40, btree=0x617000001c80, key=0x604000028290, type=3 '\003', hs_value=0x604000028450, tw=0x7f8c17bc91e0) at ../src/history/hs_rec.c:183 #6 0x0000000000d3180d in __wt_hs_insert_updates (session=0x7f8c2ac8fb58, page=0x61600011a680, multi=0x619001be2c80, cache_write_hs=0x61d000220b90) at ../src/history/hs_rec.c:638 #7 0x0000000000827a3e in __rec_hs_wrapup (session=0x7f8c2ac8fb58, r=0x61d000220680) at ../src/reconcile/rec_write.c:2319 #8 0x000000000080d8bb in __rec_write_wrapup (session=0x7f8c2ac8fb58, r=0x61d000220680, page=0x61600011a680) at ../src/reconcile/rec_write.c:2145 #9 0x00000000007fc8ab in __reconcile (session=0x7f8c2ac8fb58, ref=0x60e0000fb780, salvage=0x0, flags=36, page_lockedp=0x7f8c17bcae60) at ../src/reconcile/rec_write.c:219 #10 0x00000000007fab05 in __wt_reconcile (session=0x7f8c2ac8fb58, ref=0x60e0000fb780, salvage=0x0, flags=36) at ../src/reconcile/rec_write.c:99 #11 0x0000000000bacbb5 in __wt_sync_file (session=0x7f8c2ac8fb58, syncop=WT_SYNC_CHECKPOINT) at ../src/btree/bt_sync.c:642 #12 0x000000000099f2f7 in __checkpoint_tree (session=0x7f8c2ac8fb58, is_checkpoint=true, cfg=0x7f8c17bccbc0) at ../src/txn/txn_ckpt.c:1728 #13 0x00000000009a9eeb in __checkpoint_tree_helper (session=0x7f8c2ac8fb58, cfg=0x7f8c17bccbc0) at ../src/txn/txn_ckpt.c:1836 #14 0x00000000009a9c7d in __checkpoint_apply_to_dhandles (session=0x7f8c2ac8fb58, cfg=0x7f8c17bccbc0, op=0x9a9d50 <__checkpoint_tree_helper>) at ../src/txn/txn_ckpt.c:197 #15 0x00000000009a21f1 in __txn_checkpoint (session=0x7f8c2ac8fb58, cfg=0x7f8c17bccbc0) at ../src/txn/txn_ckpt.c:880 #16 0x000000000099d356 in __txn_checkpoint_wrapper (session=0x7f8c2ac8fb58, cfg=0x7f8c17bccbc0) at ../src/txn/txn_ckpt.c:1109 #17 0x000000000099cc60 in __wt_txn_checkpoint (session=0x7f8c2ac8fb58, cfg=0x7f8c17bccbc0, waiting=true) at ../src/txn/txn_ckpt.c:1163 #18 0x00000000008a5630 in __session_checkpoint (wt_session=0x7f8c2ac8fb58, config=0x0) at ../src/session/session_api.c:1902 #19 0x00000000004fdf5e in checkpoint (arg=0x0) at ../../../test/format/checkpoint.c:121 #20 0x00007f8c2a63b6db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #21 0x00007f8c2976d71f in clone () from /lib/x86_64-linux-gnu/libc.so.6
Another thread is opening a backup cursor but is blocked on the checkpoint lock, which makes sense as the previous thread is in checkpoint.
Thread 7 (Thread 0x7f8c1940a700 (LWP 11982)): #0 0x00007f8c2a6451fd in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007f8c2a63e025 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00000000006226f0 in __wt_spin_lock (session=0x7f8c2ac8ef70, t=0x6230000171e8) at ../src/include/mutex_inline.h:170 #3 0x0000000000620d6f in __wt_spin_lock_track (session=0x7f8c2ac8ef70, t=0x6230000171e8) at ../src/include/mutex_inline.h:318 #4 0x0000000000616eae in __wt_curbackup_open (session=0x7f8c2ac8ef70, uri=0xe32f40 <str> "backup:", other=0x0, cfg=0x7f8c19409820, cursorp=0x7f8c194097e0) at ../src/cursor/cur_backup.c:305 #5 0x0000000000864f79 in __session_open_cursor_int (session=0x7f8c2ac8ef70, uri=0xe32f40 <str> "backup:", owner=0x0, other=0x0, cfg=0x7f8c19409820, hash_value=1599835161024497006, cursorp=0x7f8c194097e0) at ../src/session/session_api.c:495 #6 0x000000000086997d in __session_open_cursor (wt_session=0x7f8c2ac8ef70, uri=0xe32f40 <str> "back--Type <RET> for more, q to quit, c to continue without paging-- up:", to_dup=0x0, config=0x7f8c19409a20 "incremental=(enabled,granularity=16384K,this_id=1620784951)", cursorp=0x7f8c194099c0) at ../src/session/session_api.c:615 #7 0x00000000004f432d in backup (arg=0x0) at ../../../test/format/backup.c:591 #8 0x00007f8c2a63b6db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #9 0x00007f8c2976d71f in clone () from /lib/x86_64-linux-gnu/libc.so.6
- duplicates
-
WT-6394 Cache stuck dirty (non prepare)
- Closed