-
Type: Task
-
Resolution: Done
-
Affects Version/s: None
-
Component/s: None
-
None
I came across a hang running test/format with LSM. It doesn't seem to be directly related to LSM.
I see the following stack traces when the application is hung:
(gdb) thread apply all where Thread 2 (Thread 0x7ffff77d6700 (LWP 45983)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:217 WT-1 0x00000000004238c3 in __wt_cond_wait (session=0x8c46c0, cond=0x8cb320, usecs=100000) at ../src/os_posix/os_mtx.c:75 WT-2 0x000000000043e055 in __wt_cache_evict_server (arg=0x8c46c0) at ../src/btree/bt_evict.c:167 WT-3 0x000000383c007d15 in start_thread (arg=0x7ffff77d6700) at pthread_create.c:308 WT-4 0x000000383b8f248d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:114 Thread 1 (Thread 0x7ffff7de1740 (LWP 45974)): #0 pthread_rwlock_wrlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_wrlock.S:85 WT-1 0x0000000000423eb0 in __wt_writelock (session=0x8c4cf0, rwlock=0x7ffff0410060) at ../src/os_posix/os_mtx.c:239 WT-2 0x000000000046487c in __wt_conn_btree_close (session=0x8c4cf0, locked=0) at ../src/conn/conn_dhandle.c:480 WT-3 0x0000000000433c4d in __wt_session_discard_btree (session=0x8c4cf0, dhandle_cache=0x0) at ../src/session/session_dhandle.c:323 WT-4 0x00000000004300a9 in __session_close_cache (session=0x8c4cf0) at ../src/session/session_api.c:39 WT-5 0x0000000000430375 in __session_close (wt_session=0x8c4cf0, config=0x0) at ../src/session/session_api.c:82 WT-6 0x000000000041d3dd in __lsm_tree_close (session=0x8c48d0, lsm_tree=0x8e2440) at ../src/lsm/lsm_tree.c:139 WT-7 0x000000000041d48a in __wt_lsm_tree_close_all (session=0x8c48d0) at ../src/lsm/lsm_tree.c:166 WT-8 0x000000000041b53f in __wt_connection_close (conn=0x8c2420) at ../src/conn/conn_open.c:94 WT-9 0x0000000000417580 in __conn_close (wt_conn=0x8c2420, config=0x0) at ../src/conn/conn_api.c:386 WT-10 0x0000000000415789 in wts_close () at ../../../test/format/wts.c:269 WT-11 0x0000000000413a25 in main (argc=0, argv=0x7fffffffe370) at ../../../test/format/t.c:158
Thread 1 is the interesting one, it's attempting to get a write lock on a dhandle. The dhandle has a reference count of 0, yet we can't get the write lock.
The dhandle does not refer to the metadata file, and the session isn't the default session.
I suspect that we're either failing to cleanup on error from a _wt_session_lock_btree call (though I'd expect to see a different error earlier in that case), or we are possibly racing opening a handle, and leaving something in a bad state.
The config I used to produce this is:
############################################ # RUN PARAMETERS ############################################ # bitcnt not applicable to this run cache=94 compression=bzip data_extend=0 data_source=lsm delete_pct=14 dictionary=0 file_type=row-store huffman_key=0 huffman_value=0 insert_pct=40 internal_key_truncation=0 internal_page_max=14 key_gap=4 key_max=102 key_min=27 leaf_page_max=21 ops=382650 prefix=1 repeat_data_pct=37 reverse=0 rows=600 runs=0 split_pct=65 threads=10 value_max=2186 value_min=3 #wiredtiger_config=lsm_merge=false write_pct=5 ############################################
The most interesting thing is that it's configuring LSM.
- related to
-
WT-1 placeholder WT-1
- Closed
-
WT-2 What does metadata look like?
- Closed
-
WT-3 What file formats are required?
- Closed
-
WT-4 Flexible cursor traversals
- Closed
-
WT-5 How does pget work: is it necessary?
- Closed
-
WT-6 Complex schema example
- Closed
-
WT-7 Do we need the handle->err/errx methods?
- Closed
-
WT-8 Do we need table load, bulk-load and/or dump methods?
- Closed
-
WT-9 Does adding schema need to be transactional?
- Closed
-
WT-10 Basic "getting started" tutorial
- Closed
-
WT-11 placeholder #11
- Closed