This one fired quickly yesterday, but I didn't get a chance to look at it until today.
Here's the stacks:
#0 0x00000030f34ccdd7 in sched_yield () from /lib64/libc.so.6 WT-1 0x000000000042bee1 in __wt_yield () at ../src/os_posix/os_yield.c:17 WT-2 0x000000000044910e in __wt_page_in_func (session=0x1171520, parent=0x7f2ee8003890, ref=0x7f2ee8003970, file=0x591226 "../src/btree/row_srch.c", line=155) at ../src/btree/bt_page.c:81 WT-3 0x0000000000450c5f in __wt_row_search (session=0x1171520, cbt=0x11e9e90, is_modify=0) at ../src/btree/row_srch.c:155 WT-4 0x00000000004423d9 in __wt_btcur_search (cbt=0x11e9e90) at ../src/btree/bt_cursor.c:139 WT-5 0x000000000042480e in __curfile_search (cursor=0x11e9e90) at ../src/cursor/cur_file.c:81 WT-6 0x0000000000406bdf in wts_read (keyno=83) at ../../../test/format/wts.c:684 WT-7 0x00000000004069f7 in wts_ops () at ../../../test/format/wts.c:605 WT-8 0x0000000000404852 in main (argc=0, argv=0x7fff2eea9450) at ../../../test/format/t.c:108 (gdb) thread 2 [Switching to thread 2 (Thread 0x7f2ef052e700 (LWP 14090))]#0 0x00000030f34de51 (gdb) where #0 0x00000030f34de513 in select () from /lib64/libc.so.6 WT-1 0x000000000040b8f1 in __wt_sleep (seconds=100, micro_seconds=0) at ../src/os_posix/os_sleep.c:22 WT-2 0x000000000040e0a8 in __wt_attach (session=0x11716b0) at ../src/support/global.c:77 WT-3 0x000000000042b7f6 in __wt_abort (session=0x11716b0) at ../src/os_posix/os_abort.c:20 WT-4 0x000000000040de6a in __wt_assert (session=0x11716b0, error=0, file_name=0x58af54 "../src/btree/rec_evict.c", line_number=48, fmt=0x58af51 "%s") at ../src/support/err.c:158 WT-5 0x0000000000419bf0 in __wt_rec_evict (session=0x11716b0, page=0xe7e000, flags=0) at ../src/btree/rec_evict.c:48 WT-6 0x0000000000410c3e in __evict_request_walk (session=0x11716b0) at ../src/btree/bt_evict.c:368 WT-7 0x00000000004109d4 in __evict_worker (session=0x11716b0) at ../src/btree/bt_evict.c:284 WT-8 0x0000000000410899 in __wt_cache_evict_server (arg=0x11eb150) at ../src/btree/bt_evict.c:242 WT-9 0x00000030f38077f1 in start_thread () from /lib64/libpthread.so.0 WT-10 0x00000030f34e592d in clone () from /lib64/libc.so.6
Here's the page:
0xe7e000 [635392-635904, 512, 2072039998]: row-store leaf (dirty, empty) parent 0x7f2ee8003890, disk 0x1b72210, entries 1 write/disk generations: 9/8 empty page tracking list: block-evict [674816-675328, 512, 2022493247] block-evict [526848-527360, 512, 1442911139] K {0000000083.00/opqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefg} V {0000000084/LMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQ value {0000000084/LMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLM value {deleted} value {0000000084/LMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLM value {0000000084/LMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLM value {0000000084/LMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLM value {0000000084/LM} value {0000000084/LMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLM value {deleted} value {0000000084/LMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLM
WT_PAGE->flags == 0x05 (WT_PAGE_BUILD_KEYS, WT_PAGE_REC_EMPTY))
So, eviction server thread is pushing out a page, the page is empty, and the eviction code doesn't like that because empty pages are supposed to be merged into their parents.
We know this page wasn't pushed into the eviction queue as part of a close or sync operation, the other thread is still running, which means evict_walk_file() selected, this page for eviction. That function checks if the page is empty, so I don't understand how this page could have gotten on the eviction queue.
WT_PAGE_REC_EMPTY is only set in the reconciliation code, so this page must have been previously reconciled, marked empty, and then new contents added.
So, how about this:
1. page is reconciled, marked empty
2. page gets enough stuff added that *wt_eviction_page_check() decides to force-evict it, calls *wt_evict_page_request to put it on the queue and wakes the eviction server
3. the eviction server, wakes up, walks the request queue, sees the page and then calls __wt_rec_evict().
The reason for the code avoiding putting empty pages (or split and split-merge pages, for that matter), on the eviction queue, is the problem if both a parent and child are added to the queue: thread WT-1 picks up the child for eviction, thread WT-2 picks up the parent for eviction, thread WT-2 evicts the parent and merges the child, thread WT-1 is left with a reference to a corrupted page.
So, we need to somehow avoid that case, but at the same time, we need to be able to force out a page even if it was previously marked "empty". (This case will never fire in the real world, I don't think, this required an abnormally small cache, where a single page is really busy, and there are no other pages to flush, to happen.)
Thoughts?
- related to
-
WT-1 placeholder WT-1
- Closed
-
WT-2 What does metadata look like?
- Closed
-
WT-3 What file formats are required?
- Closed
-
WT-4 Flexible cursor traversals
- Closed
-
WT-5 How does pget work: is it necessary?
- Closed
-
WT-6 Complex schema example
- Closed
-
WT-7 Do we need the handle->err/errx methods?
- Closed
-
WT-8 Do we need table load, bulk-load and/or dump methods?
- Closed
-
WT-9 Does adding schema need to be transactional?
- Closed
-
WT-10 Basic "getting started" tutorial
- Closed