-
Type: Task
-
Resolution: Done
-
Affects Version/s: None
-
Component/s: None
-
None
In running the evict-lsm wtperf config on the big/original AWS HD system, I get the following assertion failure with the current end of develop (changeset 6fe31533cbc60171908361de260262a537debd33).
This is happening between the populate phase and workload phase. There are only 3 threads present.
file:test-000164.lsm: ../src/lsm/lsm_cursor.c, 400: F_ISSET(chunk, WT_LSM_CHUNK_BLOOM) ? clsm->blooms[i] != NULL : clsm->blooms[i] == NULL
file:test-000164.lsm: aborting WiredTiger library
Attaching in the debugger:
(gdb) thread apply all bt
Thread 3 (Thread 0x7fbb1faa1700 (LWP 6394)):
#0 0x00007fbb2076dc6b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
WT-1 0x0000000000414f29 in __wt_cond_wait (session=0x1902060, cond=0x1908ab0, usecs=100000) at ../src/os_posix/os_mtx.c:75
WT-2 0x00000000004306c7 in __wt_cache_evict_server (arg=0x1902060) at ../src/btree/bt_evict.c:167
WT-3 0x00007fbb20769c6b in start_thread () from /lib64/libpthread.so.0
WT-4 0x00007fbb200a25ed in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7fbb1da9d700 (LWP 6398)):
#0 0x00007fbb2009b7b3 in select () from /lib64/libc.so.6
WT-1 0x0000000000416875 in __wt_sleep (seconds=100, micro_seconds=0) at ../src/os_posix/os_sleep.c:22
WT-2 0x0000000000426cd1 in __wt_attach (session=0x1902cc0) at ../src/support/global.c:97
WT-3 0x000000000047700a in __wt_abort (session=0x1902cc0) at ../src/os_posix/os_abort.c:21
WT-4 0x0000000000426721 in __wt_assert (session=0x1902cc0, error=0, file_name=0x4b1eb6 "../src/lsm/lsm_cursor.c", line_number=400,
fmt=0x4b1eb3 "%s") at ../src/support/err.c:408
WT-5 0x000000000046f22a in __clsm_open_cursors (clsm=0x7fbb001eb190, update=0, start_chunk=0, start_id=12) at ../src/lsm/lsm_cursor.c:399
WT-6 0x000000000046f316 in __wt_clsm_init_merge (cursor=0x7fbb001eb190, start_chunk=0, start_id=12, nchunks=15) at ../src/lsm/lsm_cursor.c:424
WT-7 0x0000000000473538 in __wt_lsm_merge (session=0x1902cc0, lsm_tree=0x1933040, id=0, stalls=0) at ../src/lsm/lsm_merge.c:213
WT-8 0x00000000004108b4 in __wt_lsm_merge_worker (vargs=0x1933fe0) at ../src/lsm/lsm_worker.c:87
WT-9 0x00007fbb20769c6b in start_thread () from /lib64/libpthread.so.0
WT-10 0x00007fbb200a25ed in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7fbb2121a740 (LWP 6393)):
#0 0x00007fbb2076adaf in pthread_join () from /lib64/libpthread.so.0
WT-1 0x0000000000416912 in __wt_thread_join (session=0x1901e50, tid=140441633281792) at ../src/os_posix/os_thread.c:35
WT-2 0x000000000040e0be in __lsm_tree_close (session=0x1901e50, lsm_tree=0x1933040) at ../src/lsm/lsm_tree.c:87
WT-3 0x000000000040e4f0 in __wt_lsm_tree_close_all (session=0x1901e50) at ../src/lsm/lsm_tree.c:166
WT-4 0x000000000040c3f3 in __wt_connection_close (conn=0x19010f0) at ../src/conn/conn_open.c:83
WT-5 0x0000000000408862 in __conn_close (wt_conn=0x19010f0, config=0x0) at ../src/conn/conn_api.c:518
WT-6 0x000000000040520d in main (argc=5, argv=0x7fff89605828) at ../../../bench/wtperf/wtperf.c:1093
(gdb) p/x *chunk
$1 = {id = 0x1d, generation = 0x1, uri = 0x7fbb00050fe0, bloom_uri = 0x7fbb000990a0, count = 0x2c00e, create_ts =
, txnid_max = 0x0, flags = 0x1d}
(Flags field translates to: BLOOM | EVICTED | MERGING | ONDISK)
(gdb) p/x *clsm
$2 = {iface = {session = 0x1902cc0, uri = 0x19329d0, key_format = 0x1932890, value_format = 0x19328b0, get_key = 0x465d7d,
get_value = 0x46692b, set_key = 0x465e40, set_value = 0x466c92, compare = 0x46f5f9, next = 0x46f9d6, prev = 0x46fea3, reset = 0x470423,
search = 0x470519, search_near = 0x470a9d, insert = 0x47199c, update = 0x471eb9, remove = 0x4723be, close = 0x47275c, q =
, recno = 0x0, raw_recno_buf =
{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, key =
{ data = 0x0, size = 0x0, flags = 0x0, mem = 0x0, memsize = 0x0}, value =
{data = 0x0, size = 0x0, flags = 0x0, mem = 0x0, memsize = 0x0}, saved_err = 0x0, flags = 0x380}, lsm_tree = 0x1933040, dsk_gen = 0x6a2, nchunks = 0xf, nupdates = 0x0,
blooms = 0x7fbb005fdbe0, bloom_alloc = 0x78, cursors = 0x7fbb00655570, cursor_alloc = 0x78, current = 0x0, primary_chunk = 0x0,
txnid_max = 0x0, txnid_alloc = 0x0, flags = 0x24}
(Flags field translates to: MERGE | OPEN_READ)
(gdb) p i
$3 = 1
(gdb) x/4x clsm->blooms
0x7fbb005fdbe0: 0x00000000 0x00000000 0x00000000 0x00000000
That bloom filter does exist:
rw-rw-r- 1 sue sue 196608 Sep 11 16:05 test-000140.bf
rw-rw-r- 1 sue sue 27136000 Sep 11 16:05 test-000140.lsm
rw-rw-r- 1 sue sue 184320 Sep 11 16:05 test-000164.bf
rw-rw-r- 1 sue sue 25268224 Sep 11 16:05 test-000164.lsm
rw-rw-r- 1 sue sue 143360 Sep 11 16:05 test-000171.bf
rw-rw-r- 1 sue sue 19013632 Sep 11 16:05 test-000171.lsm
The number of LSM files goes up into the high 800s:
rw-rw-r- 1 sue sue 1605632 Sep 11 16:06 test-000886.lsm
rw-rw-r- 1 sue sue 1880064 Sep 11 16:06 test-000887.lsm
rw-rw-r- 1 sue sue 1720320 Sep 11 16:06 test-000888.lsm
rw-rw-r- 1 sue sue 4096 Sep 11 16:06 test-000894.lsm
rw-rw-r- 1 sue sue 151552 Sep 11 16:06 test-000895.bf
rw-rw-r- 1 sue sue 20480000 Sep 11 16:06 test-000895.lsm
Looking at __clsm_open_cursors, the code in the "open cursors" loop has this conditional:
if (F_ISSET(chunk, WT_LSM_CHUNK_BLOOM) &&
!F_ISSET(clsm, WT_CLSM_MERGE))
WT_ERR(__wt_bloom_open(session, chunk->bloom_uri,
lsm_tree->bloom_bit_count,
lsm_tree->bloom_hash_count,
c, &clsm->blooms[i]));
The code in the diagnostic check has:
/_ Make sure the Bloom config matches. _/
WT_ASSERT(session, F_ISSET(chunk, WT_LSM_CHUNK_BLOOM) ?
clsm->blooms[i] != NULL : clsm->blooms[i] == NULL);
The naive/obvious fix is to make the WT_ASSERT conditional match the earlier code.
I will first test for reproducibility and then test that fix.
- related to
-
WT-1 placeholder WT-1
- Closed
-
WT-2 What does metadata look like?
- Closed
-
WT-3 What file formats are required?
- Closed
-
WT-4 Flexible cursor traversals
- Closed
-
WT-5 How does pget work: is it necessary?
- Closed
-
WT-6 Complex schema example
- Closed
-
WT-7 Do we need the handle->err/errx methods?
- Closed
-
WT-8 Do we need table load, bulk-load and/or dump methods?
- Closed
-
WT-9 Does adding schema need to be transactional?
- Closed
-
WT-10 Basic "getting started" tutorial
- Closed