-
Type: Task
-
Resolution: Done
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
The parallel-pop-lsm wtperf configuration is consistently dropping core.
Here's the stack and other useful info.
(gdb) p *lsm_tree
$3 =
Note lsm_tree->nchunks here is 36.
(gdb) bt
#0 0x000000000046cf90 in __clsm_open_cursors (clsm=0x7f2be4002b00, update=1,
start_chunk=0, start_id=0) at ../src/lsm/lsm_cursor.c:339
WT-1 0x000000000046c4da in __clsm_enter (clsm=0x7f2be4002b00, update=1)
at ../src/lsm/lsm_cursor.c:93
WT-2 0x000000000046f86e in __clsm_insert (cursor=0x7f2be4002b00)
at ../src/lsm/lsm_cursor.c:1048
WT-3 0x00000000004042b4 in populate_thread (arg=0x7fff485b5e90)
at ../../../bench/wtperf/wtperf.c:467
WT-4 0x00007f2c084dcc6b in start_thread () from /lib64/libpthread.so.0
WT-5 0x00007f2c080215ed in clone () from /lib64/libc.so.6
(gdb) p *clsm
$4 =
Note clsm->nchunks is 50 here.
(gdb) p i
$5 = 49
(gdb) p skip_chunks
$6 = 49
Clearly when we execute:
chunk = lsm_tree->chunk[i + start_chunk];
we're way beyond the end of the lsm_tree->chunk array.
I debugged this and the issue is that we drop the lock to close the cursors and during that time the lsm_tree chunks changes and reduces. Therefore all the old values for skip_chunks are no longer valid.
I have a fix I'm trying.
- is related to
-
WT-647 Retry if releasing lock reduced nchunks. (WT-646)
- Closed
- related to
-
WT-1 placeholder WT-1
- Closed
-
WT-2 What does metadata look like?
- Closed
-
WT-3 What file formats are required?
- Closed
-
WT-4 Flexible cursor traversals
- Closed
-
WT-5 How does pget work: is it necessary?
- Closed