Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-3504

Deadlock in wiredtiger

    • Type: Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Priority: Icon: Critical - P2 Critical - P2
    • None
    • Affects Version/s: WT2.9.3
    • Component/s: None
    • None
    • Environment:

      I've upgraded wiredtiger v2.9.2 to current version of wiredtiger - v2.9.3 and got deadlock issues.

      Usage is simple (just for indexes):

      • I have one table with one column - "key"
      • Many requests to wiredtiger is binary search on keys range
      • Single thread will write or remove keys to the table
      • Single thread will make checkpoints and backups on schedule

      In about day of application running all is fine. But once wiredtiger starts some checkpoint phase, I got deadlock with v2.9.3.

      I tried to inspect callstacks with Sysinternals ProcessExplorer, and got stack like this:

      ntoskrnl.exe!KeSynchronizeExecution+0x2246
      ntoskrnl.exe!KeWaitForMultipleObjects+0x135e
      ntoskrnl.exe!KeWaitForMultipleObjects+0xdd9
      ntoskrnl.exe!KeWaitForMutexObject+0x373
      ntoskrnl.exe!KeStallWhileFrozen+0x1977
      ntoskrnl.exe!PoStartNextPowerIrp+0x109d
      ntoskrnl.exe!KeWaitForMultipleObjects+0x152f
      ntoskrnl.exe!KeWaitForMultipleObjects+0xdd9
      ntoskrnl.exe!FsRtlGetNextBaseMcbEntry+0x327
      ntoskrnl.exe!RtlAddAce+0x14a
      ntoskrnl.exe!setjmpex+0x34a3
      ntdll.dll!NtWaitForAlertByThreadId+0xa
      ntdll.dll!RtlSleepConditionVariableCS+0xc2
      KERNELBASE.dll!SleepConditionVariableCS+0x28
      WiredTigerNet.dll+0x32e9a
      WiredTigerNet.dll+0x42472
      WiredTigerNet.dll+0x4270f
      WiredTigerNet.dll+0x446da
      WiredTigerNet.dll+0x34ebd
      MSVCR120.dll!beginthreadex+0x107
      MSVCR120.dll!endthreadex+0x118
      KERNEL32.dll!BaseThreadInitThunk+0x22
      ntdll.dll!RtlUserThreadStart+0x34
      

      Then I made minidump of process, and tryed to debug it with Visual Studio, I got following stacktraces:

      1 Thread:

      [External code: KERNELBASE.dll!SleepConditionVariableCS+0x28]
      __wt_cond_wait_signal (os_mtx_cond.c:101)
      __wt_cond_auto_wait_signal (cond_auto.c:63)
      __wt_evict_thread_run (evict_lru.c:311)
      __thread_run (thread_group.c:31)
      [External code: mscrt]
      

      1 Thread:

      [External code : KERNELBASE.dll!SleepConditionVariableCS+0x28]
      __wt_cond_wait_signal (os_mtx_cond.c:101)
      __sweep_server (conn_sweep.c:278)
      [External code: mscrt]
      

      1 Thread:

      [External code: KERNELBASE.dll!ConditionVariableCS+0x28]
      __wt_cond_wait (os_mtx_cond.c:101)
      __wt_readlock (mtx_rw.c:219)
      __wt_session_lock_dhandle (session_dhandle.c:183)
      __wt_session_get_btree (session_dhandle.c:506)
      __conn_btree_apply_internal (conn_dhandle.c:412)
      __wt_conn_btree_apply (conn_dhandle.c:474)
      __checkpoint_apply_all (txn_ckpt.c:189)
      __checkpoint_prepare (txn_ckpt.c:647)
      __txn_checkpoint (txn_ckpt.c:757)
      __txn_checkpoint_wrapper (txn_ckpt.c:947)
      __wt_txn_checkpoint (txn_ckpt.c:1003)
      __session_checkpoint (session_api.c:1658)
      __conn_open_session (conn_api.c: 1179)
      [External code: WiredTigerNet::Session::Checkpoint]
      

      705 Threads:

      [External code]
      __wt_cond_wait_signal (os_mtx_cond.c:101)
      __wt_readlock (mtx_rw.c:219)
      __wt_session_lock_dhandle (session_dhandle.c:183)
      __wt_session_get_btree (session_dhandle.c:506)
      __wt_session_get_btree_ckpt (session_dhandle.c:346)
      __wt_curfile_open (cur_file.c:601)
      __session_open_cursor_int (session_api.c:386)
      __wt_open_cursor (session_api.c:439)
      __curtable_open_colgroups (cur_table.c:893)
      __wt_curtable_open (cur_table.c:1053)
      __session_open_cursor_int (session_api.c:346)
      __session_open_cursor (session_api.c:481)
      [External code: WiredTigerNet::OpenWiredTigerCursor]
      

      WiredTigerNet is simple .NET-wrapper for wiredtiger. I use wiredtiger operations in .NET Threads Pool. When deadlock is occured, thread in pool have blocked and .NET Thread Pool will add new thread to pool every second. Thats the cause of 705 threads are blocked in my case.

      For now I get back to v2.9.2. Can you try to fix deadlock with provided stack traces?

        1. stacktrace.txt
          181 kB
          Sergey Zagursky

            Assignee:
            michael.cahill@mongodb.com Michael Cahill (Inactive)
            Reporter:
            halex2005 halex2005
            Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

              Created:
              Updated:
              Resolved: