Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-3268

Failure to close cursor can get wiredtiger stuck in a cursor-close loop

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • WT2.9.3, 3.5.9
    • Affects Version/s: None
    • Component/s: None
    • None
    • Storage 2017-04-17, Storage 2017-05-08

      Fault-injection, WT-32, identified this as a potential bug. Here are the details:

      While closing the cursors (and hence writing a checkpoint), a fault was injected to fail ftruncate, which in the non-debug version caused the application to hang. Following backtraces were obtained at a gap of few seconds each when the application appeared hung:

      1:__session_close,__conn_close,start_run,start_all_runs,main
      1:pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait_signal,__wt_cond_wait,__sweep_server,start_thread,clone
      1:pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait_signal,__wt_cond_auto_wait_signal,__wt_cond_auto_wait,__wt_evict_thread_run,__wt_thread_run,start_thread,clone
      
      1:__session_close,__conn_close,start_run,start_all_runs,main
      1:pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait_signal,__wt_cond_wait,__sweep_server,start_thread,clone
      1:pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait_signal,__wt_cond_auto_wait_signal,__wt_cond_auto_wait,__wt_evict_thread_run,__wt_thread_run,start_thread,clone
      
      1:__strcmp_sse2_unaligned,__session_close,__conn_close,start_run,start_all_runs,main
      1:pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait_signal,__wt_cond_wait,__sweep_server,start_thread,clone
      1:pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait_signal,__wt_cond_auto_wait_signal,__wt_cond_auto_wait,__wt_evict_thread_run,__wt_thread_run,start_thread,clone
      

      Fault was induced by failing ftruncate, when it was called with the following backtrace:

      ftruncate
      __posix_file_truncate
      __wt_ftruncate
      __wt_block_truncate
      __wt_block_extlist_truncate
      __ckpt_process
      __wt_block_checkpoint
      __bm_checkpoint
      __rec_write_wrapup
      __wt_reconcile
      __wt_evict_file
      __wt_cache_op
      __checkpoint_tree
      __wt_checkpoint_close
      __wt_conn_btree_sync_and_close
      __wt_session_release_btree
      __curfile_close
      __session_close
      __conn_close
      start_run
      

      Until the fault injection infrastructure gets available for a reproduction of the bug, with the fault injection library in hand, following command will reproduce this bug:

      FAULTINJECT_LIBRARY_NAME='__wt' LD_LIBRARY_PATH='/path/to/fi-lib/.libs:/path/to/wiredtiger/build_posix/.libs' LD_PRELOAD='/path/to/fi-lib/.libs/libfaultinject.so' FAULTINJECT_FAIL_COUNT=113 ./bench/wtperf/wtperf -O ../bench/wtperf/runners/medium-btree.wtperf -o verbose=2
      

            Assignee:
            sulabh.mahajan@mongodb.com Sulabh Mahajan
            Reporter:
            sulabh.mahajan@mongodb.com Sulabh Mahajan
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: