Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-6204

Possible race between backup and checkpoint at file close

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • WT10.0.1, 4.4.7, 5.0.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • 8
    • Storage - Ra 2021-05-03

      It is possible that we have an adverse race condition between file close and backup.

      The specific concern is that when we take a checkpoint when closing a file. The sequence in which this happens is (roughly)

      1. Allocate and initialize a new WT_CKPT in wt_meta_ckptlist_get().
      2. Mark all other checkpoints for deletion in drop()
      3. Reduce the set of checkpoints being deleted if there is a hot backup or if the list is too big. This happens in in checkpoint_lock_dirty_tree_int()
      4. Call checkpoint_tree() to create the new checkpoint.

      Normally, this is all done while holding the checkpoint lock.

      When a user opens a backup cursor, we copy the metadata for the backup to the WiredTiger.backup file.  This metadata includes the currently extant checkpoints for each file.  Backup acquires the checkpoint and schema locks when doing this.  So in the normal case this prevents races with the operations, above.

      But when closing a file, we write a final checkpoint of the file in wt_checkpoint_close(). This code path does not appear to hold the checkpoint lock.  Some, but not all callers hold the schema lock, which would also protect against concurrent backup operations.

      The possible race happens if backup can overlap with the operations above. If the backup cursor is created after step #3, and before the checkpoint is created and entered in the metadata in step #4, then the backup might include the old checkpoint that is about to be deleted, rather than the one that is about to be created.  In the likely event that the backup user doesn't copy the file until after the new checkpoint has been synced to disk, then the backup metadata will be inconsistent with the backup copy of the file.

      I spent a bit of time inspecting the code and couldn't find a point of synchronization that prevents this.  (Admittedly, I am inexpert in this code.). I also attempted, unsuccessfully, to produce a test that would trigger the race. In particular wt_checkpoint_close only creates a checkpoint for normal files if its final parameter is set to false.  I couldn't find a test case that would call wt_checkpoint_close() with final == false.  

            Assignee:
            jie.chen@mongodb.com Jie Chen
            Reporter:
            keith.smith@mongodb.com Keith Smith
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: