@michaelcahill, I just saw a core dump on bengal I haven't seen before:
Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffecdfa700 (LWP 4434)] 0x00000000004a28c0 in __ovfl_cache_row_visible (session=0x8ee090, page=0x7fff90007020, rip=0x7fff90007160) at ../src/btree/bt_ovfl.c:170 170 if (__wt_txn_visible_all(session, upd->txnid))
Function call stack:
(gdb) where #0 0x00000000004a28c0 in __ovfl_cache_row_visible (session=0x8ee090, page=0x7fff90007020, rip=0x7fff90007160) at ../src/btree/bt_ovfl.c:170 WT-1 0x00000000004a2bee in __wt_val_ovfl_cache (session=0x8ee090, page=0x7fff90007020, cookie=0x7fff90007160, unpack=0x7fffecdf97d0) at ../src/btree/bt_ovfl.c:317 WT-2 0x000000000045bdc5 in __rec_row_leaf (session=0x8ee090, r=0x7fffe0007a60, page=0x7fff90007020, salvage=0x0) at ../src/btree/rec_write.c:3367 WT-3 0x0000000000455f6a in __wt_rec_write (session=0x8ee090, page=0x7fff90007020, salvage=0x0, flags=0) at ../src/btree/rec_write.c:329 WT-4 0x000000000043ee99 in __wt_sync_file (session=0x8ee090, syncop=1) at ../src/btree/bt_evict.c:529 WT-5 0x000000000044c5c3 in __wt_bt_cache_op (session=0x8ee090, ckptbase=0x0, op=1) at ../src/btree/bt_sync.c:64 WT-6 0x000000000043ac19 in __wt_checkpoint_write_leaves (session=0x8ee090, cfg=0x7fffecdf9c80) at ../src/txn/txn_ckpt.c:774
And, in that function, upd has been overwritten with text data:
(gdb) p upd
$15 = (WT_UPDATE *) 0x4e4d4c4b4a494847
(gdb) printf "%s\n", &upd
GHIJKLMN?
But the WT_UPDATE chain from first looks OK:
(gdb) p first $10 = (WT_UPDATE *) 0x7fffcc010900 (gdb) p first->next $11 = (WT_UPDATE *) 0x7fffac0075c0 (gdb) p $11->next $12 = (WT_UPDATE *) 0x7fffb800e0a0 (gdb) p $12->next $13 = (WT_UPDATE *) 0x7fffcc009ba0 (gdb) p $13->next $14 = (WT_UPDATE *) 0x0
So, we're:
- writing a row-store leaf page in __rec_row_leaf, and
- discarding an overflow value from that page,
- we call __wt_val_ovfl_cache which acquires the btree overflow cache lock,
- then calls __ovfl_cache_row_visible which is going to walk the list of WT_UPDATE structures for the page entry to see if there's a globally visible update.
I'm guessing that we were walking the WT_UPDATE list and one of the structures was free'd and re-purposed, and then upd = upd->next left us with garbage in upd?
We're not holding the serial function lock here, but that should be safe, we're not supposed to get beyond a WT_UPDATE structure that's globally visible?
- is related to
-
WT-643 test/format failure: illegal cell and page type combination
- Closed
- related to
-
WT-1 placeholder WT-1
- Closed
-
WT-2 What does metadata look like?
- Closed
-
WT-3 What file formats are required?
- Closed
-
WT-4 Flexible cursor traversals
- Closed
-
WT-5 How does pget work: is it necessary?
- Closed