Fault-injection, WT-32, identified this as a potential bug. Here are the details:
While closing the cursors (and hence writing a checkpoint), a fault was injected to fail ftruncate, which in the non-debug version caused the application to hang. Following backtraces were obtained at a gap of few seconds each when the application appeared hung:
1:__session_close,__conn_close,start_run,start_all_runs,main 1:pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait_signal,__wt_cond_wait,__sweep_server,start_thread,clone 1:pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait_signal,__wt_cond_auto_wait_signal,__wt_cond_auto_wait,__wt_evict_thread_run,__wt_thread_run,start_thread,clone
1:__session_close,__conn_close,start_run,start_all_runs,main 1:pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait_signal,__wt_cond_wait,__sweep_server,start_thread,clone 1:pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait_signal,__wt_cond_auto_wait_signal,__wt_cond_auto_wait,__wt_evict_thread_run,__wt_thread_run,start_thread,clone
1:__strcmp_sse2_unaligned,__session_close,__conn_close,start_run,start_all_runs,main 1:pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait_signal,__wt_cond_wait,__sweep_server,start_thread,clone 1:pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait_signal,__wt_cond_auto_wait_signal,__wt_cond_auto_wait,__wt_evict_thread_run,__wt_thread_run,start_thread,clone
Fault was induced by failing ftruncate, when it was called with the following backtrace:
ftruncate __posix_file_truncate __wt_ftruncate __wt_block_truncate __wt_block_extlist_truncate __ckpt_process __wt_block_checkpoint __bm_checkpoint __rec_write_wrapup __wt_reconcile __wt_evict_file __wt_cache_op __checkpoint_tree __wt_checkpoint_close __wt_conn_btree_sync_and_close __wt_session_release_btree __curfile_close __session_close __conn_close start_run
Until the fault injection infrastructure gets available for a reproduction of the bug, with the fault injection library in hand, following command will reproduce this bug:
FAULTINJECT_LIBRARY_NAME='__wt' LD_LIBRARY_PATH='/path/to/fi-lib/.libs:/path/to/wiredtiger/build_posix/.libs' LD_PRELOAD='/path/to/fi-lib/.libs/libfaultinject.so' FAULTINJECT_FAIL_COUNT=113 ./bench/wtperf/wtperf -O ../bench/wtperf/runners/medium-btree.wtperf -o verbose=2