Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-4046

Hang between racing compact and alter calls in LSM

    • 5

      A run of test/format encountered a hang, when running compact and alter commands at the same time on an LSM tree. The two relevant call stacks are:

      Thread 3 (Thread 0x7f7776f7f700 (LWP 15008)):
      #0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
      #1  0x00007f7790afad02 in _L_lock_791 () from /lib64/libpthread.so.0
      #2  0x00007f7790afac08 in __GI___pthread_mutex_lock (mutex=0x62c000000800) at pthread_mutex_lock.c:64
      #3  0x0000000000629750 in __wt_spin_lock (session=0x7f779156c0c0, t=0x62c000000800) at ../src/include/mutex.i:173
      #4  0x000000000061a3c5 in __lsm_tree_close (session=0x7f779156c0c0, lsm_tree=0x615000000800, final=false) at ../src/lsm/lsm_tree.c:135
      #5  0x000000000061fb08 in __lsm_tree_find (session=0x7f779156c0c0, uri=0x6190001e0600 "lsm:wt", exclusive=true, treep=0x7f7776f7dea0) at ../src/lsm/lsm_tree.c:430
      #6  0x000000000061e587 in __wt_lsm_tree_get (session=0x7f779156c0c0, uri=0x6190001e0600 "lsm:wt", exclusive=true, treep=0x7f7776f7dea0) at ../src/lsm/lsm_tree.c:578
      #7  0x0000000000628bf7 in __wt_lsm_tree_worker (session=0x7f779156c0c0, uri=0x6190001e0600 "lsm:wt", file_func=0xacf630 <__alter_file>, name_func=0x0, cfg=0x7f7776f7ec40, open_flags=336) at ../src/lsm/lsm_tree.c:1386
      #8  0x0000000000acf574 in __schema_alter (session=0x7f779156c0c0, uri=0x6190001e0600 "lsm:wt", newcfg=0x7f7776f7ec40) at ../src/schema/schema_alter.c:208
      #9  0x0000000000acfe44 in __alter_tree (session=0x7f779156c0c0, name=0x6020000039f0 "colgroup:wt", newcfg=0x7f7776f7ec40) at ../src/schema/schema_alter.c:116
      #10 0x0000000000ad08b1 in __alter_table (session=0x7f779156c0c0, uri=0x611000000040 "table:wt", newcfg=0x7f7776f7ec40) at ../src/schema/schema_alter.c:166
      #11 0x0000000000acf606 in __schema_alter (session=0x7f779156c0c0, uri=0x611000000040 "table:wt", newcfg=0x7f7776f7ec40) at ../src/schema/schema_alter.c:211
      #12 0x0000000000acf281 in __wt_schema_alter (session=0x7f779156c0c0, uri=0x611000000040 "table:wt", newcfg=0x7f7776f7ec40) at ../src/schema/schema_alter.c:227
      #13 0x00000000007048cd in __session_alter (wt_session=0x7f779156c0c0, uri=0x611000000040 "table:wt", config=0x7f7776f7ed60 "access_pattern_hint=random") at ../src/session/session_api.c:689
      

      The alter command is doing:

                      WT_WITHOUT_LOCKS(session,
                          __wt_lsm_manager_clear_tree(session, lsm_tree));
      

      From the stack trace it must be in the lock re-acquisition phase of the WT_WITHOUT_LOCKS macro.

      and

      The compact code is doing:

      Thread 2 (Thread 0x7f777573a700 (LWP 15011)):
      #0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
      #1  0x0000000000660246 in __wt_cond_wait_signal (session=0x7f779156e280, cond=0x60c0000025c0, usecs=10000, run_func=0x76a8b0 <__read_blocked>, signalled=0x7f7775738780) at ../src/os_posix/os_mtx_cond.c:122
      #2  0x000000000076a636 in __wt_cond_wait (session=0x7f779156e280, cond=0x60c0000025c0, usecs=10000, run_func=0x76a8b0 <__read_blocked>) at ../src/include/misc.i:19
      #3  0x0000000000769fe0 in __wt_readlock (session=0x7f779156e280, l=0x616000003380) at ../src/support/mtx_rw.c:257
      #4  0x000000000074c963 in __wt_session_lock_dhandle (session=0x7f779156e280, flags=0, is_deadp=0x7f7775738ee0) at ../src/session/session_dhandle.c:183
      #5  0x000000000074fd17 in __wt_session_get_dhandle (session=0x7f779156e280, uri=0x611000000040 "table:wt", checkpoint=0x0, cfg=0x0, flags=0) at ../src/session/session_dhandle.c:510
      #6  0x00000000006dc357 in __wt_schema_get_table_uri (session=0x7f779156e280, uri=0x611000000040 "table:wt", ok_incomplete=false, flags=0, tablep=0x7f77757392a0) at ../src/schema/schema_list.c:28
      #7  0x00000000006f57cf in __wt_schema_worker (session=0x7f779156e280, uri=0x611000000040 "table:wt", file_func=0x749180 <__compact_handle_append>, name_func=0x7497b0 <__compact_uri_analyze>, cfg=0x7f7775739c20, open_flags=0) at ../src/schema/schema_worker.c:97
      #8  0x000000000074773c in __wt_session_compact (wt_session=0x7f779156e280, uri=0x611000000040 "table:wt", config=0x0) at ../src/session/session_compact.c:409
      #9  0x0000000000518d9f in compact (arg=0x0) at ../../../test/format/compact.c:74
      

      The job that hung was:
      http://build.wiredtiger.com:8080/job/wiredtiger-test-format-stress-sanitizer/19916/

            Assignee:
            backlog-server-storage-engines [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            alexander.gorrod@mongodb.com Alexander Gorrod
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: