Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-9944

Truncate write skew vs. format mirroring

    • Type: Icon: Bug Bug
    • Resolution: Works as Designed
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None

      It came up in the course of the WT-9715 discussion that it might be possible for some combination of eviction, checkpointing, and logging to allow a truncate write-skew situation (as described in WT-4158) to provoke mirror failure.

      While this doesn't seem to be what's causing WT-9715, unfortunately I think it is possible.

      For this to happen, one needs to first create the write skew situation; that is, a key where there's a concurrent insert and truncate. The runtime behavior will be that the inserted key is not truncated, because it's not visible. However, with logged tables, if the insert commits before the truncate the recovery behavior will be that the inserted key is truncated, because the transactions are no longer concurrent.

      Then we need to arrange the on-disk state such that the truncate is on disk in one table but not the other; then one table will exhibit the runtime behavior and the other will exhibit the recovery behavior, and the mirroring will fail.

      Since the update needs to commit first, we can assume it's committed. Then, suppose we start a checkpoint and then commit the truncate, and then evict the page with the write skew key in table 1 but not table 2. And suppose this much happens while the checkpoint is busy elsewhere, so it hasn't got to the relevant part of either table yet.

      At this point, I think what happens is that the evicted page in table 1 makes it into the checkpoint, but the unevicted page in table 2 is not written, or written without the truncate, because the truncate isn't visible to the checkpoint.

      Then after the checkpoint we crash out. I think at this point we lose.

      Format won't notice or care if the key this happens to is an insert key (because it doesn't crosscheck those or expect them to be consistent across mirrors) but if it's re-adding a normal key after previously deleting it I think it can fail.

      Note that this is not a fast-truncate issue, it's an issue with the cursor-level semantics of truncate.

      Since fixing WT-4158 properly is a large can of worms, probably the only solution is to disable truncate when both mirroring and logging are turned on. At least, assuming this behavior ever actually manifests in the wild; it has enough prerequisites that it might not. But equally, recognizing it if it happens isn't easy and nobody wants to spend time investigating what might turn out to be a known issue.

      This is one of those things where I'd love to be wrong, because mirror-format has been a really important test asset for fast-truncate.

            Assignee:
            backlog-server-storage-engines [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            dholland+wt@sauclovia.org David Holland
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: