-
Type: Task
-
Resolution: Done
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
Running test/format with the following configuration:
############################################ # RUN PARAMETERS ############################################ # bitcnt not applicable to this run cache=94 compression=bzip data_extend=0 data_source=lsm delete_pct=14 dictionary=0 file_type=row-store hot_backups=0 huffman_key=0 huffman_value=0 insert_pct=40 internal_key_truncation=0 internal_page_max=14 key_gap=4 key_max=102 key_min=27 leaf_page_max=21 ops=382656 prefix=1 repeat_data_pct=37 reverse=0 rows=600067 runs=0 split_pct=65 threads=10 value_max=2186 value_min=3 # wiredtiger_config not applicable to this run write_pct=5 ############################################
The application ends up stuck (it's not making any progress at all. All application threads have the call stack:
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:217 WT-1 0x0000000000423c5f in __wt_cond_wait (session=0x8c4b40, cond=0x8cb8a0, usecs=10000) at ../src/os_posix/os_mtx.c:75 WT-2 0x000000000044488a in __wt_cache_full_check (session=0x8c4b40, onepass=0) at ../src/include/cache.i:87 WT-3 0x000000000044498b in __wt_page_in_func (session=0x8c4b40, parent=0x7fffe8b9b550, ref=0x7fffe8b9bad0, file=0x66beb6 "../src/btree/row_srch.c", line=201) at ../src/btree/bt_page.c:47 WT-4 0x00000000004a3c3a in __wt_page_swap_func (session=0x8c4b40, out=0x7fffe8b9b550, in=0x7fffe8b9b550, inref=0x7fffe8b9bad0, file=0x66beb6 "../src/btree/row_srch.c", line=201) at ../src/include/btree.i:489
The eviction server is looping as expected, populating the eviction queue. However the WT_EVICT_NO_PROGRESS flag is never being cleared, so no pages are being successfully evicted.
The WT_EVICT_STUCK flag is set, but the clause at bt_evict.c:__evict_get_page:961 that aborts transactions is never firing. I wonder if the __wt_txn_oldest check isn't working as expected?
We should figure out how to make progress. I suspect that all pages have open hazard references. I'll need to look more carefully at the state of the cache.