Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-8429

Improve error handling when rollback_to_stable fails

    • Type: Icon: Task Task
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: RTS
    • Storage Engines
    • 5
    • Megabat - 2024-05-14

      MongoDB passes 0 as flag to __wt_session_get_dhandle when called from _rollback_to_stable_btree_apply. In SERVER-60335 and SERVER-58311 we've been exploring passing WT_DHANDLE_DISCARD | WT_DHANDLE_EXCLUSIVE to _wt_session_get_dhandle to be able to detect when MongoDB keeps cursors while calling rollback_to_stable.

      However, this turned out to be a difficult task to enable. When the query subsystem yields execution we release our locks and wait to be resumed. If another thread needs to call rollback_to_stable all user threads are interrupted using an interruption exception where the operation is torn down and various destructors are called (including the one holding the cursor). The thread that is about to call rollback_to_stable do so after taking the global lock in exclusive mode but this means that we don't have synchronization with the threads tearing down with the interruption exception. In addition to this PM-2451 is ongoing which explicitly will keep cursors over getMore and yield.

      So instead of trying to enable this assertion it would be very useful if rollback_to_stable can mark all cursors that are open and return an error on the user thread if they are ever used again with next or similar. This could also be very useful to debug things like BF-23021 when there is an active transaction concurrent with rollback_to_stable.

       

            Assignee:
            ravi.giri@mongodb.com Ravi Giri
            Reporter:
            henrik.edin@mongodb.com Henrik Edin
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: