Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-13612

Time aggregate merge logic incorrect for chunk merge, page split scenario

    • 8
    • StorEng - 2024-10-29, StorEng - 2024-11-12
    • v8.0

      Symptom:

      WiredTiger reports invalid / out of order timestamps on update.

      [1729712684:890923][11652:0x71c37a2b1740], WT_SESSION.commit_transaction: [WT_VERB_DEFAULT][ERROR]: int __wt_txn_timestamp_usage_check(WT_SESSION_IMPL *, WT_TXN_OP *, wt_timestamp_t, wt_timestamp_t), 593: file:table2.wt: unexpected timestamp usage: updating a value with a timestamp (0, 1593328) before the previous update (0, 1611559)

      Cause:

      In a less common page split scenario a reconciliation results in two chunks, the latter of the two is too small to justify creating a page for but too big to merge back into the previous page. In this case WiredTiger moves some content from the previous page into the latter page. In doing so it updates the time aggregates of both pages.

      In the situation that the bug occurs the time aggregate applied to the latter page is incorrect.

      Simply put if page prev had keys from A->M and keys A->H were kept on that page, with keys I->M moving to latter then latter would have the time aggregates of keys A->M applied instead of time aggregate of keys I->M. The prev page logic was correct and would get the time aggregate for A->H. This meant that the latter page's time aggregate became a combination of (A->M + latter_orig). This is then leaves the latter page with an invalid aggregate.

      Impact:

      Because of the way time aggregates are merged, the most conservative value is chosen for each timestamp. This means that even if we did choose the wrong timestamp values it won't result in data loss. Instead it will mainly impact performance related things. Pages with the incorrect aggregate will result in:

      1. RTS unnecessarily visiting a page. Which is slow but not bad.
      2. The tree walk logic will visit the page that it might not need to. Again a performance issue.
      3. The checkpoint cleanup logic won't be able to cleanup the page as soon, which is a performance issue.

      No data correctness issues were identified in the lifecycle of the ticket.

        1. WT-13612.workload
          32 kB
          Peter Macko

            Assignee:
            luke.pearson@mongodb.com Luke Pearson
            Reporter:
            Xgen-BuildBaron-User xgen-buildbaron-user
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: