I noticed today that the Jenkins job for medium-lsm-compact was hung. It is a deadlock around the schema and dhandle locks, presumably. The job does not have line numbers. Here are the stacks of threads waiting:
Thread 10 (Thread 0x7f4904bfe700 (LWP 17853)): #0 0x00007f4906164265 in __lll_lock_wait () from /lib64/libpthread.so.0 WT-1 0x00007f490615fdc1 in _L_lock_816 () from /lib64/libpthread.so.0 WT-2 0x00007f490615fcc7 in pthread_mutex_lock () from /lib64/libpthread.so.0 WT-3 0x0000000000481ee4 in __wt_conn_dhandle_discard_single () WT-4 0x00000000004117cc in __sweep_server () WT-5 0x00007f490615df18 in start_thread () from /lib64/libpthread.so.0 WT-6 0x00007f4905e93b2d in clone () from /lib64/libc.so.6 Thread 7 (Thread 0x7f49033fb700 (LWP 17856)): #0 0x00007f4906164265 in __lll_lock_wait () from /lib64/libpthread.so.0 WT-1 0x00007f490615fdc1 in _L_lock_816 () from /lib64/libpthread.so.0 WT-2 0x00007f490615fcc7 in pthread_mutex_lock () from /lib64/libpthread.so.0 WT-3 0x00000000004211af in __lsm_worker_manager () WT-4 0x00007f490615df18 in start_thread () from /lib64/libpthread.so.0 WT-5 0x00007f4905e93b2d in clone () from /lib64/libc.so.6 Thread 6 (Thread 0x7f4902bfa700 (LWP 17857)): #0 0x00007f4905e7cb97 in sched_yield () from /lib64/libc.so.6 WT-1 0x0000000000480bd7 in __conn_dhandle_open_lock () WT-2 0x00000000004815b7 in __wt_conn_btree_get () WT-3 0x000000000044b492 in __wt_session_get_btree () WT-4 0x0000000000481d8f in __wt_conn_dhandle_close_all () WT-5 0x00000000004419bf in __wt_schema_drop () WT-6 0x000000000049fa1f in __lsm_drop_file () WT-7 0x00000000004a056d in __wt_lsm_free_chunks () WT-8 0x00000000004255b9 in __lsm_worker () WT-9 0x00007f490615df18 in start_thread () from /lib64/libpthread.so.0 WT-10 0x00007f4905e93b2d in clone () from /lib64/libc.so.6 Thread 5 (Thread 0x7f4901bff700 (LWP 17858)): #0 0x00007f4906164265 in __lll_lock_wait () from /lib64/libpthread.so.0 WT-1 0x00007f490615fdc1 in _L_lock_816 () from /lib64/libpthread.so.0 WT-2 0x00007f490615fcc7 in pthread_mutex_lock () from /lib64/libpthread.so.0 WT-3 0x000000000044b3a3 in __wt_session_get_btree () WT-4 0x000000000044b6d2 in __wt_session_get_btree_ckpt () WT-5 0x000000000048b13f in __wt_curfile_open () WT-6 0x00000000004498d0 in __wt_open_cursor () WT-7 0x0000000000449b25 in __session_open_cursor () WT-8 0x00000000004b04e4 in __wt_bloom_finalize () WT-9 0x000000000049ffbf in __wt_lsm_work_bloom () WT-10 0x00000000004255f5 in __lsm_worker () WT-11 0x00007f490615df18 in start_thread () from /lib64/libpthread.so.0 WT-12 0x00007f4905e93b2d in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x7f4900bfd700 (LWP 17860)): #0 0x00007f4906164265 in __lll_lock_wait () from /lib64/libpthread.so.0 WT-1 0x00007f490615fdc1 in _L_lock_816 () from /lib64/libpthread.so.0 WT-2 0x00007f490615fcc7 in pthread_mutex_lock () from /lib64/libpthread.so.0 WT-3 0x0000000000448706 in __session_create () WT-4 0x00000000004b04bb in __wt_bloom_finalize () WT-5 0x000000000049e5d2 in __wt_lsm_merge () WT-6 0x00000000004254d7 in __lsm_worker () WT-7 0x00007f490615df18 in start_thread () from /lib64/libpthread.so.0 WT-8 0x00007f4905e93b2d in clone () from /lib64/libc.so.6
I will try to repro on the AWS HDD machine.
- is related to
-
WT-1819 Split sweep into two passes
- Closed
- related to
-
WT-1 placeholder WT-1
- Closed
-
WT-2 What does metadata look like?
- Closed
-
WT-3 What file formats are required?
- Closed
-
WT-4 Flexible cursor traversals
- Closed
-
WT-5 How does pget work: is it necessary?
- Closed
-
WT-6 Complex schema example
- Closed
-
WT-7 Do we need the handle->err/errx methods?
- Closed
-
WT-8 Do we need table load, bulk-load and/or dump methods?
- Closed
-
WT-9 Does adding schema need to be transactional?
- Closed
-
WT-10 Basic "getting started" tutorial
- Closed
-
WT-11 placeholder #11
- Closed
-
WT-12 Write more examples
- Closed
-
WT-1811 Change sweep to not wait on the dhandle list lock
- Closed