This issue was noticed while working on WT-2731, with the following test/format configuration:
############################################ # RUN PARAMETERS ############################################ abort=0 auto_throttle=1 backups=0 bitcnt=4 bloom=1 bloom_bit_count=4 bloom_hash_count=30 bloom_oldest=0 cache=5 checkpoints=1 checksum=uncompressed chunk_size=4 compaction=0 compression=zlib data_extend=0 data_source=file delete_pct=35 dictionary=0 direct_io=0 encryption=rotn-7 evict_max=0 file_type=row-store firstfit=1 huffman_key=0 huffman_value=0 in_memory=0 insert_pct=24 internal_key_truncation=1 internal_page_max=11 isolation=snapshot key_gap=9 key_max=51 key_min=18 leaf_page_max=17 leak_memory=0 logging=0 logging_archive=0 logging_compression=none logging_prealloc=0 long_running_txn=0 lsm_worker_threads=3 merge_max=16 mmap=1 ops=100000 prefix_compression=1 prefix_compression_min=6 quiet=1 repeat_data_pct=66 reverse=0 rows=100000 runs=1 rebalance=1 salvage=1 split_pct=69 statistics=1 statistics_server=0 threads=4 timer=20 transaction-frequency=87 value_max=3294 value_min=8 verify=1 wiredtiger_config= write_pct=71 ############################################
The cache dump looks like:
========== cache dump file:wt(<live>): internal pages: 1 pages, 1545 max, 0MB total leaf pages: 4 pages, 1392140 max, 4MB total dirty pages: 1 pages, 1545 max, 0MB total file:WiredTigerLAS.wt(<live>): internal pages: 1 pages, 249 max, 0MB total leaf pages: 1 pages, 412 max, 0MB total dirty pages: 1 pages, 249 max, 0MB total file:WiredTiger.wt(<live>): internal pages: 1 pages, 249 max, 0MB total dirty pages: 1 pages, 249 max, 0MB total cache dump: total found = 5MB vs tracked inuse 5MB ==========
There are 4 clean leaf pages, and 4 threads running snapshot isolation transactions, each pinning a single page. In this case, I'd expect the cache stuck check to fire, but it's not. After some time in a debugger, it appears as though there is some eviction activity happening via the lookaside file:
(gdb) where #0 __wt_cache_page_evict (session=0x632000001500, page=0x6080000d2020) at ../src/include/btree.i:302 #1 0x0000000000ac1d1e in __wt_page_out (session=0x632000001500, pagep=0x60400000d790) at ../src/btree/bt_discard.c:104 #2 0x0000000000ac0d93 in __wt_ref_out (session=0x632000001500, ref=0x60400000d790) at ../src/btree/bt_discard.c:33 #3 0x0000000000651ae5 in __evict_page_clean_update (session=0x632000001500, ref=0x60400000d790, closing=false) at ../src/evict/evict_page.c:224 #4 0x000000000064cfe0 in __wt_evict (session=<optimized out>, ref=<optimized out>, closing=<optimized out>) at ../src/evict/evict_page.c:121 #5 0x0000000000628dd6 in __evict_page (session=0x632000001500, is_server=true) at ../src/evict/evict_lru.c:1665 #6 0x0000000000639baa in __evict_lru_pages (session=0x632000001500, is_server=true) at ../src/evict/evict_lru.c:916 #7 0x000000000063b93b in __evict_pass (session=0x632000001500) at ../src/evict/evict_lru.c:677 #8 0x00000000006368ab in __evict_server (session=0x632000001500, did_work=0x7f68c5ffee30) at ../src/evict/evict_lru.c:271 #9 0x000000000061c5f9 in __evict_thread_run (arg=0x632000001500) at ../src/evict/evict_lru.c:207 #10 0x00007f68cace5df3 in start_thread () from /lib64/libpthread.so.0 #11 0x00007f68c9ecf1ad in clone () from /lib64/libc.so.6 (gdb) p page->memory_footprint $43 = 412 (gdb) p page->dsk $44 = (const WT_PAGE_HEADER *) 0x6120002824c0 (gdb) p *$44 $45 = {recno = 0, write_gen = 123, mem_size = 316, u = {entries = 8, datalen = 8}, type = 7 '\a', flags = 12 '\f', unused = "\000"} (gdb) p session->dhandle->name $46 = 0x60300000d750 "file:WiredTigerLAS.wt"
It is specifically the __wt_las_sweep function that is triggering cache activity:
(gdb) where #0 __wt_las_sweep (session=0x632000001840) at ../src/cache/cache_las.c:289 #1 0x00000000005bf76d in __sweep_server (arg=0x632000001840) at ../src/conn/conn_sweep.c:283 #2 0x00007f68cace5df3 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f68c9ecf1ad in clone () from /lib64/libc.so.6
We should stop counting eviction of lookaside file pages as relevant to the cache->evict_page count, so that the diagnostic stuck cache check will fire as expected.