-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
This ticket has evolved. It looks like the biggest problem is that using zlib compression we are creation a 2.5MB page on disk (when uncompressed) - the cache size is only 1MB. So a page swap with that page causes eviction to stall.
Old analysis follows:
There is a test/format workload that has hung due to a full cache. The configuration has a single worker thread and a 1MB cache.
The only application thread is:
#2 0x000000000043196e in __wt_cond_wait (session=0x2b778b0, cond=0x2b75b50, usecs=100000) at ../src/include/misc.i:18 #3 0x0000000000435d08 in __wt_cache_eviction_worker (session=0x2b778b0, busy=false, pct_full=328) at ../src/evict/evict_lru.c:1544 #4 0x00000000004a5802 in __wt_cache_eviction_check (session=0x2b778b0, busy=false, didworkp=0x0) at ../src/include/cache.i:236 #5 0x00000000004a5f54 in __wt_txn_begin (session=0x2b778b0, cfg=0x0) at ../src/include/txn.i:266 #6 0x00000000004a5fd4 in __wt_txn_autocommit_check (session=0x2b778b0) at ../src/include/txn.i:287 #7 0x00000000004a842f in __wt_page_in_func (session=0x2b778b0, ref=0x345a5c0, flags=0, file=0x6c2f45 "../src/btree/col_srch.c", line=93) at ../src/btree/bt_read.c:575 #8 0x00000000004c237d in __wt_page_swap_func (session=0x2b778b0, held=0x2b75ec0, want=0x345a5c0, flags=0, file=0x6c2f45 "../src/btree/col_srch.c", line=93) at ../src/include/btree.i:1260 #9 0x00000000004c2baa in __wt_col_search (session=0x2b778b0, recno=87641, leaf=0x0, cbt=0x7f23b0032d40) at ../src/btree/col_srch.c:93 #10 0x0000000000516f4d in __cursor_col_search (session=0x2b778b0, cbt=0x7f23b0032d40, leaf=0x0) at ../src/btree/bt_cursor.c:226 #11 0x0000000000518821 in __wt_btcur_remove (cbt=0x7f23b0032d40) at ../src/btree/bt_cursor.c:670 #12 0x00000000004da976 in __curfile_remove (cursor=0x7f23b0032d40) at ../src/cursor/cur_file.c:331 ---Type <return> to continue, or q <return> to quit--- #13 0x00000000004131fa in col_remove (cursor=0x7f23b0032d40, key=0x7f23c891cdf0, keyno=87641, notfoundp=0x7f23c891cda4) at ../../../test/format/ops.c:1183 #14 0x00000000004115fa in ops (arg=0x255cd50) at ../../../test/format/ops.c:426
i.e: It is an auto-commit transaction that is doing a cache full check before allocating an ID.
There are 5 pages in cache, 848 bytes of them on internal pages. Two pages belong to the file:wt. One is a small internal page, the other is a 2.5MB leaf page.
An oddity is that the session that is in __wt_txn_begin already has a snapshot allocated:
(gdb) p session->txn
$30 = {id = 0, isolation = WT_ISO_SNAPSHOT, snap_min = 4, snap_max = 4,
snapshot = 0x7f23b0032960, snapshot_count = 0, txn_logsync = 0, mod = 0x0,
mod_alloc = 0, mod_count = 0, logrec = 0x0, notify = 0x0, ckpt_lsn = {
file = 0, offset = 0}, full_ckpt = false, ckpt_nsnapshot = 0,
ckpt_snapshot = 0x0, flags = 8}
Which is keeping the system wide snap_min pinned to 4:
(gdb) p $3->txn_global $22 = {current = 4, last_running = 4, oldest_id = 4, scan_count = 0, checkpoint_id = 0, checkpoint_gen = 0, checkpoint_pinned = 0, nsnap_rwlock = 0x2b731b0, nsnap_oldest_id = 0, nsnaph = {tqh_first = 0x0, tqh_last = 0x2b6b478}, states = 0x2b94d80}
- is depended on by
-
SERVER-22146 WiredTiger changes for 3.3.1
- Closed