Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-11566

Fast truncate hang when applying commit timestamp

    • 5
    • BermudaTriangle- 2023-09-05, TheMoon-StorEng - 2023-09-19

      While creating a cppsuite test to test background compaction I encountered a hang when doing fast truncate.

      There are two threads of concern that are common between reproductions.
      Thread 1 (Truncate):

      Thread 32 (Thread 0x7fffc0ff9640 (LWP 1237916) "run"):
      #0  0x00007ffff7708cab in sched_yield () from /lib/x86_64-linux-gnu/libc.so.6
      #1  0x00007ffff7cc09d1 in __wt_yield () at /home/ubuntu/wiredtiger/src/os_posix/os_yield.c:25
      #2  0x00007ffff7ab690d in __wt_txn_op_delete_commit_apply_timestamps (session=0x7ffff78ce9d0, ref=0x7fffa801bfa0) at /home/ubuntu/wiredtiger/src/include/txn_inline.h:313
      #3  0x00007ffff7ab5015 in __wt_txn_op_set_timestamp (session=0x7ffff78ce9d0, op=0x7fffa02c3a10) at /home/ubuntu/wiredtiger/src/include/txn_inline.h:392
      #4  0x00007ffff7aaf3b5 in __wt_txn_modify_page_delete (session=0x7ffff78ce9d0, ref=0x7fffa801bfa0) at /home/ubuntu/wiredtiger/src/include/txn_inline.h:471
      #5  0x00007ffff7aae562 in __wt_delete_page (session=0x7ffff78ce9d0, ref=0x7fffa801bfa0, skipp=0x7fffc0ff35d8) at /home/ubuntu/wiredtiger/src/btree/bt_delete.c:192
      #6  0x00007ffff7b2ba50 in __tree_walk_internal (session=0x7ffff78ce9d0, refp=0x7fffa021cdc0, walkcntp=0x0, skip_func=0x7ffff7a78fc0 <__wt_btcur_skip_page>, func_cookie=0x0, flags=1808) at /home/ubuntu/wiredtiger/src/btree/bt_walk.c:441
      #7  0x00007ffff7b2c019 in __wt_tree_walk_custom_skip (session=0x7ffff78ce9d0, refp=0x7fffa021cdc0, skip_func=0x7ffff7a78fc0 <__wt_btcur_skip_page>, func_cookie=0x0, flags=1552) at /home/ubuntu/wiredtiger/src/btree/bt_walk.c:553
      #8  0x00007ffff7a76760 in __wt_btcur_next (cbt=0x7fffa021cbe0, truncating=true) at /home/ubuntu/wiredtiger/src/btree/bt_curnext.c:930
      #9  0x00007ffff7a99131 in __wt_cursor_truncate (start=0x7fffa021cbe0, stop=0x7fffa028cd20, rmfunc=0x7ffff7a963e0 <__cursor_row_modify>) at /home/ubuntu/wiredtiger/src/btree/bt_cursor.c:2069
      #10 0x00007ffff7a994d6 in __wt_btcur_range_truncate (trunc_info=0x7fffc0ff5148) at /home/ubuntu/wiredtiger/src/btree/bt_cursor.c:2219
      #11 0x00007ffff7d301cf in __wt_schema_range_truncate (trunc_info=0x7fffc0ff5148) at /home/ubuntu/wiredtiger/src/schema/schema_truncate.c:174
      #12 0x00007ffff7d36277 in __wt_session_range_truncate (session=0x7ffff78ce9d0, uri=0x0, start=0x7fffa021cbe0, stop=0x7fffa028cd20) at /home/ubuntu/wiredtiger/src/session/session_api.c:1639
      #13 0x00007ffff7d44650 in __session_truncate (wt_session=0x7ffff78ce9d0, uri=0x0, start=0x7fffa021cbe0, stop=0x7fffa028cd20, config=0x0) at /home/ubuntu/wiredtiger/src/session/session_api.c:1741
      #14 0x0000000000458ad8 in test_harness::thread_worker::truncate (this=0x7fffdc042fa0, collection_id=2, start_key=..., stop_key=..., config=...) at /home/ubuntu/wiredtiger/test/cppsuite/src/main/thread_worker.cpp:253
      #15 0x0000000000430c51 in test_harness::background_compact::remove_operation (this=0x7fffffffcf58, tw=0x7fffdc042fa0) at /home/ubuntu/wiredtiger/test/cppsuite/tests/background_compact.cpp:200
      

      Thread 2 (Checkpoint):

      Thread 23 (Thread 0x7fffe57fa640 (LWP 1237907) "run"):
      #0  0x00007ffff771b7ed in select () from /lib/x86_64-linux-gnu/libc.so.6
      #1  0x00007ffff7cc013e in __wt_sleep (seconds=0, micro_seconds=1000) at /home/ubuntu/wiredtiger/src/os_posix/os_sleep.c:30
      #2  0x00007ffff7adae3e in __wt_spin_backoff (yield_count=0x7fffe57f4110, sleep_usecs=0x7fffe57f4118) at /home/ubuntu/wiredtiger/src/include/misc_inline.h:197
      #3  0x00007ffff7ad972e in __wt_page_in_func (session=0x7ffff78cbf10, ref=0x7fffa801bfa0, flags=6558, func=0x7ffff7de5c34 "int __tree_walk_internal(WT_SESSION_IMPL *, WT_REF **, uint64_t *, int (*)(WT_SESSION_IMPL *, WT_REF *, void *, _Bool, _Bool *), void *, uint32_t)", line=461) at /home/ubuntu/wiredtiger/src/btree/bt_read.c:495
      #4  0x00007ffff7b2cdb8 in __wt_page_swap_func (session=0x7ffff78cbf10, held=0x0, want=0x7fffa801bfa0, flags=6558, func=0x7ffff7de5c34 "int __tree_walk_internal(WT_SESSION_IMPL *, WT_REF **, uint64_t *, int (*)(WT_SESSION_IMPL *, WT_REF *, void *, _Bool, _Bool *), void *, uint32_t)", line=461) at /home/ubuntu/wiredtiger/src/include/btree_inline.h:2201
      #5  0x00007ffff7b2bb6a in __tree_walk_internal (session=0x7ffff78cbf10, refp=0x7fffe57f6068, walkcntp=0x0, skip_func=0x7ffff7b0c2a0 <__sync_page_skip>, func_cookie=0x0, flags=6426) at /home/ubuntu/wiredtiger/src/btree/bt_walk.c:460
      #6  0x00007ffff7b2c019 in __wt_tree_walk_custom_skip (session=0x7ffff78cbf10, refp=0x7fffe57f6068, skip_func=0x7ffff7b0c2a0 <__sync_page_skip>, func_cookie=0x0, flags=6170) at /home/ubuntu/wiredtiger/src/btree/bt_walk.c:553
      #7  0x00007ffff7b0b29c in __wt_sync_file (session=0x7ffff78cbf10, syncop=WT_SYNC_CHECKPOINT) at /home/ubuntu/wiredtiger/src/btree/bt_sync.c:350
      #8  0x00007ffff7dbf0b8 in __checkpoint_tree (session=0x7ffff78cbf10, is_checkpoint=true, cfg=0x7fffe57f8140) at /home/ubuntu/wiredtiger/src/txn/txn_ckpt.c:2269
      #9  0x00007ffff7dc3ef8 in __checkpoint_tree_helper (session=0x7ffff78cbf10, cfg=0x7fffe57f8140) at /home/ubuntu/wiredtiger/src/txn/txn_ckpt.c:2394
      #10 0x00007ffff7dc3de2 in __checkpoint_apply_to_dhandles (session=0x7ffff78cbf10, cfg=0x7fffe57f8140, op=0x7ffff7dc3e40 <__checkpoint_tree_helper>) at /home/ubuntu/wiredtiger/src/txn/txn_ckpt.c:351
      #11 0x00007ffff7dc0be9 in __txn_checkpoint (session=0x7ffff78cbf10, cfg=0x7fffe57f8140) at /home/ubuntu/wiredtiger/src/txn/txn_ckpt.c:1154
      #12 0x00007ffff7dbe4c8 in __txn_checkpoint_wrapper (session=0x7ffff78cbf10, cfg=0x7fffe57f8140) at /home/ubuntu/wiredtiger/src/txn/txn_ckpt.c:1443
      #13 0x00007ffff7dbdea8 in __wt_txn_checkpoint (session=0x7ffff78cbf10, cfg=0x7fffe57f8140, waiting=true) at /home/ubuntu/wiredtiger/src/txn/txn_ckpt.c:1519
      #14 0x00007ffff7d4ae78 in __session_checkpoint (wt_session=0x7ffff78cbf10, config=0x0) at /home/ubuntu/wiredtiger/src/session/session_api.c:2370
      #15 0x000000000044f1a7 in test_harness::database_operation::checkpoint_operation (this=0x7fffffffcf58, tc=0x7fffdc03ef90) at /home/ubuntu/wiredtiger/test/cppsuite/src/main/database_operation.cpp:154
      

      At this point in time, the truncate thread is waiting on a WT_REF lock that appears to already be locked. While the checkpoint thread is also waiting for the same WT_REF lock and hangs inside the __wt_page_in_func function.

      To reproduce run the cppsuite tests attached to the ticket.

        1. background_compact.cpp
          10 kB
        2. background_compact_default.txt
          0.9 kB

            Assignee:
            sean.watt@mongodb.com Sean Watt
            Reporter:
            sean.watt@mongodb.com Sean Watt
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: