-
Type: Task
-
Resolution: Cannot Reproduce
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
I noticed that wiredtiger-test-format had been running on tinderbox for 4 days. It is hung. The config:
############################################ # RUN PARAMETERS ############################################ abort=0 auto_throttle=1 firstfit=0 bitcnt=5 bloom=1 bloom_bit_count=17 bloom_hash_count=16 bloom_oldest=0 cache=1 checkpoints=1 checksum=uncompressed chunk_size=8 compaction=0 compression=none data_extend=0 data_source=file delete_pct=31 dictionary=0 evict_max=0 file_type=row-store backups=1 huffman_key=0 huffman_value=0 insert_pct=7 internal_key_truncation=1 internal_page_max=17 isolation=read-committed key_gap=4 key_max=122 key_min=17 leak_memory=0 leaf_page_max=10 logging=0 logging_archive=1 logging_prealloc=0 logging=0 lsm_worker_threads=3 merge_max=6 mmap=1 ops=1000 prefix_compression=1 prefix_compression_min=0 repeat_data_pct=64 reverse=0 rows=10000 runs=50 split_pct=80 statistics=1 threads=1 timer=0 value_max=1107 value_min=18 wiredtiger_config= write_pct=50 ############################################
Attaching to the process here are the stacks:
(gdb) thread apply all bt Thread 3 (Thread 0x7f013687d700 (LWP 127010)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 WT-1 0x00000000004354af in __wt_cond_wait (session=session@entry=0x1cc7b40, cond=0x1ce2a10, usecs=usecs@entry=100000) at ../src/os_posix/os_mtx_cond.c:77 WT-2 0x000000000042402e in __evict_server (arg=0x1cc7b40) at ../src/evict/evict_lru.c:195 WT-3 0x0000003f17e07ee5 in start_thread (arg=0x7f013687d700) at pthread_create.c:309 WT-4 0x0000003f176f4b8d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 2 (Thread 0x7f013607c700 (LWP 127011)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 WT-1 0x00000000004354af in __wt_cond_wait (session=session@entry=0x1cc7e00, cond=0x1cf9250, usecs=usecs@entry=10000000) at ../src/os_posix/os_mtx_cond.c:77 WT-2 0x000000000041c744 in __sweep_server (arg=0x1cc7e00) at ../src/conn/conn_sweep.c:122 WT-3 0x0000003f17e07ee5 in start_thread (arg=0x7f013607c700) at pthread_create.c:309 WT-4 0x0000003f176f4b8d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 1 (Thread 0x7f0136a8a740 (LWP 94803)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 WT-1 0x00000000004354af in __wt_cond_wait (session=session@entry=0x1cc80c0, cond=0x1ce2a80, usecs=usecs@entry=100000) at ../src/os_posix/os_mtx_cond.c:77 WT-2 0x00000000004c523c in __wt_cache_full_check (session=0x1cc80c0) at ../src/include/cache.i:166 WT-3 __cursor_enter (session=0x1cc80c0) at ../src/include/cursor.i:62 WT-4 __curfile_enter (cbt=<optimized out>, cbt=<optimized out>) at ../src/include/cursor.i:97 WT-5 __cursor_func_init (reenter=1, cbt=0x24a2c50) at ../src/include/cursor.i:184 WT-6 __wt_btcur_search (cbt=cbt@entry=0x24a2c50) at ../src/btree/bt_cursor.c:307 WT-7 0x0000000000493533 in __curfile_search (cursor=0x24a2c50) at ../src/cursor/cur_file.c:166 WT-8 0x00000000004127a4 in read_row (keyno=2, key=0x7fff0744e670, cursor=0x24a2c50) at ../../../test/format/ops.c:601 WT-9 wts_read_scan () at ../../../test/format/ops.c:560 WT-10 0x000000000040d8d7 in main (argc=<optimized out>, argv=<optimized out>) at ../../../test/format/t.c:187
This is a tiny cache of 1Mb.
(gdb) p pages_evicted $2 = 2166 (gdb) p *cache $3 = {bytes_inmem = 6458665, pages_inmem = 2169, bytes_evict = 5202306, pages_evict = 2166, bytes_dirty = 1891, pages_dirty = 1, read_gen = 4450648, evict_cond = 0x1ce2a10, evict_lock = {lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = { __prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, counter = 0, name = 0x65e6a1 "cache eviction", id = 0 '\000', initialized = 1 '\001'}, evict_walk_lock = {lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, counter = 0, name = 0x65e6b0 "cache walk", id = 0 '\000', initialized = 1 '\001'}, evict_waiter_cond = 0x1ce2a80, eviction_trigger = 95, eviction_target = 80, eviction_dirty_target = 80, evict = 0x1ce2af0, evict_current = 0x0, evict_candidates = 0, evict_entries = 0, evict_max = 0, evict_slots = 400, evict_file_next = 0x0, sync_request = 0, sync_complete = 0, cp_saved_evict = 0, cp_current_evict = 0, cp_skip_count = 0, cp_reserved = 0, cp_session = 0x0, cp_tid = 0, flags = 16} (gdb) p conn->cache_size $5 = 1048576
We have one, very large (larger than cache), internal page that says it is already on disk:
(gdb) p *ref->page $32 = {u = {intl = {recno = 0, parent_ref = 0x1cfa0c0, __index = 0x24e6920}, row = {d = 0x0, ins = 0x1cfa0c0, upd = 0x24e6920, entries = 0}, col_fix = { recno = 0, bitf = 0x1cfa0c0 "\360:\373\001", entries = 38693152}, col_var = {recno = 0, d = 0x1cfa0c0, repeats = 0x24e6920, nrepeats = 0, entries = 0}}, dsk = 0x1fb3c00, modify = 0x0, read_gen = 0, memory_footprint = 1254123, type = 6 '\006', flags_atomic = 2 '\002'} (gdb) p *ref $35 = {page = 0x0, home = 0x1fb3af0, ref_hint = 0, state = WT_REF_DISK, addr = 0x1fb3c2a, key = {recno = 4294967379, ikey = 0x100000053}, page_del = 0x0}
- is related to
-
WT-1613 test/format stress hang - eviction stuck
- Closed
-
WT-1646 test/format stress hang
- Closed
-
WT-1759 Limit size of internal pages
- Closed
-
WT-1788 Review eviction server algorithm
- Closed
-
WT-1851 test/format hung in LSM tree.
- Closed
- related to
-
WT-1 placeholder WT-1
- Closed
-
WT-2 What does metadata look like?
- Closed
-
WT-3 What file formats are required?
- Closed
-
WT-4 Flexible cursor traversals
- Closed
-
WT-5 How does pget work: is it necessary?
- Closed
-
WT-6 Complex schema example
- Closed
-
WT-7 Do we need the handle->err/errx methods?
- Closed
-
WT-8 Do we need table load, bulk-load and/or dump methods?
- Closed
-
WT-9 Does adding schema need to be transactional?
- Closed
-
WT-10 Basic "getting started" tutorial
- Closed