Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-9954

Fix assertion failure due to handle close during checkpoint_flush_tier()

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • WT11.3.0, 8.0.0-rc0, 7.3.0-rc2
    • Affects Version/s: None
    • Component/s: None
    • None
    • Storage Engines

      After merging WT-9881, test_bulk01 started failing when run with the tiered hook.

      It is triggering an assert in __txn_checkpoint_establish_time():

      WT_ASSERT(session, session->current_ckpt_sec == 0); 

      Here's a call stack taken from an evergreen failure:

      #1  0x00007f70b6e37859 in __GI_abort () at abort.c:79
      #2  0x00007f70b5a9cc85 in __wt_abort (session=session@entry=0x55e38ea5fcd0) at ../src/os_common/os_abort.c:30
      #3  0x00007f70b5b3ffa5 in __txn_checkpoint_establish_time (session=session@entry=0x55e38ea5fcd0) at ../src/txn/txn_ckpt.c:952
      #4  0x00007f70b5b45c2f in __wt_checkpoint_close (session=session@entry=0x55e38ea5fcd0, final=final@entry=false) at ../src/txn/txn_ckpt.c:2478
      #5  0x00007f70b59dbcd6 in __wt_conn_dhandle_close (session=session@entry=0x55e38ea5fcd0, final=final@entry=false, mark_dead=mark_dead@entry=false) at ../src/conn/conn_dhandle.c:382
      #6  0x00007f70b5b17c2f in __wt_session_release_dhandle (session=session@entry=0x55e38ea5fcd0) at ../src/session/session_dhandle.c:259
      #7  0x00007f70b5b3fd2b in __checkpoint_flush_tier (session=session@entry=0x55e38ea5fcd0, force=force@entry=true) at ../src/txn/txn_ckpt.c:145
      #8  0x00007f70b5b42f35 in __checkpoint_prepare (session=session@entry=0x55e38ea5fcd0, trackingp=trackingp@entry=0x7ffd1bf965ee, cfg=cfg@entry=0x7ffd1bf96750) at ../src/txn/txn_ckpt.c:791
      #9  0x00007f70b5b44062 in __txn_checkpoint (session=session@entry=0x55e38ea5fcd0, cfg=0x7ffd1bf96750) at ../src/txn/txn_ckpt.c:1096
      #10 0x00007f70b5b4554c in __txn_checkpoint_wrapper (session=session@entry=0x55e38ea5fcd0, cfg=cfg@entry=0x7ffd1bf96750) at ../src/txn/txn_ckpt.c:1394
      #11 0x00007f70b5b4574d in __wt_txn_checkpoint (session=session@entry=0x55e38ea5fcd0, cfg=cfg@entry=0x7ffd1bf96750, waiting=waiting@entry=true) at ../src/txn/txn_ckpt.c:1471
      #12 0x00007f70b5b101c7 in __session_checkpoint (wt_session=0x55e38ea5fcd0, config=0x55e38f1f0be0 "flush_tier=(enabled,force=true)") at ../src/session/session_api.c:2200
      #13 0x00007f70b614cdcc in _wrap_Session_checkpoint (self=<optimized out>, args=<optimized out>) at lang/python/wiredtigerPYTHON_wrap.c:7558
      #14 0x00007f70b6b694b8 in cfunction_call (func=0x7f70b623e6d0, args=<optimized out>, kwargs=<optimized out>) at ../src/Python-3.9.2/Objects/methodobject.c:548
      #15 0x00007f70b6b5ea06 in _PyObject_MakeTpCall (tstate=0x55e38d6d4000, callable=0x7f70b623e6d0, args=<optimized out>, nargs=<optimized out>, keywords=<optimized out>) at ../src/Python-3.9.2/Include/internal/pycore_pyerrors.h:14

      The problem here is the logic around session->current_ckpt_sec. It is set in __txn_checkpoint_establish_time(), which assumes it will be set exactly once at the start of a checkpoint and then cleared when the checkpoint is done. In this stack, we are in the checkpoint prepare phase and __checkpoint_flush_tier() has closed a dhandle, which results in closing and checkpointing the file. Checkpointing that file, in __wt_checkpoint_close(), also calls __txn_checkpoint_establish_time() resulting in the assertion failure.

            Assignee:
            sue.loverso@mongodb.com Susan LoVerso
            Reporter:
            keith.smith@mongodb.com Keith Smith
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: