Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-7750

exclusive handle access fails if cache contains dirty data

    • 8
    • Storage - Ra 2021-09-20

      In working WT-7507, haribabu.kommi and I came across a problem inĀ test_prepare_hs03.

      Any operation that requires exclusive access to an object (currently that list includes verify, salvage/rollback-to-stable, and upgrade), will first attempt to close all of the existing open handles and then open an exclusive handle on the object.

      If there are dirty updates in the cache for the object, as part of closing all open handles we call __wt_txn_checkpoint(), which hits this code:

          /*
           * Don't flush data from modified trees independent of system-wide checkpoint when either there
           * is a stable timestamp set or the connection is configured to disallow such operation.
           * Flushing trees can lead to files that are inconsistent on disk after a crash.
           */
          if (btree->modified && !bulk && !__wt_btree_immediately_durable(session) &&
            (S2C(session)->txn_global.has_stable_timestamp ||
              (!F_ISSET(S2C(session), WT_CONN_FILE_CLOSE_SYNC) && !metadata)))
              return (__wt_set_return(session, EBUSY));
      

      and returns EBUSY, and the operation fails.

      This is easy to reproduce with test_prepare_hs03, and haribabu.kommi believes he's seen it where MongoDB reports EBUSY returns from collection validation. (As MongoDB surfaces the collection validation operation through its API, it makes sense a MongoDB application could see this failure.)

      alexander.gorrod, vamsi.krishna, we could potentially:

      • force a database-wide checkpoint as part of an operation requiring exclusive access to the object (if EBUSY is returned from our attempt to close all open handles, we could do a database-wide checkpoint and then try again).
      • document this away, although it's messy to do that because as soon as a checkpoint completes, then the failing operation can proceed, so it's a case of repeatedly trying until the operation succeeds.
      • haribabu.kommi thinks that this code may be too pessimistic, and that maybe we can relax the constraints, that history-store means the check may no longer be required.

      Anyway, can you folks weigh in on this one and give us some guidance?

        1. reproducer.py
          5 kB
          Etienne Petrel

            Assignee:
            keith.bostic@mongodb.com Keith Bostic (Inactive)
            Reporter:
            keith.bostic@mongodb.com Keith Bostic (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: