The all_committed timestamp should be less than any in-flight transaction

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Fixed
    • Priority: Major - P3
    • 3.6.4, 3.7.3, WT3.1.0
    • Affects Version/s: None
    • Component/s: None
    • Storage Non-NYC 2018-03-12
    • None

      Originally we thought it was fine to use min(largest committed timestamp, all active timestamps) for the all_committed timestamp. However, it would be more useful to return:

      min(largest committed timestamp, active timestamp - 1)

      This behavior will be useful on one-voting-node replica sets. With such sets, the primary node could immediately set the majority commit level after every write is durable, since a majority of 1 is 1. Unfortunately, because writes commit out of timestamp order, it means they can become durable out of timestamp order. We need to be able to set the majority level according to the latest durable timestamp that has no uncommitted operations with timestamps less than it. The all_committed value can help provide this: in a thread loop, we can query what the all_committed value X is, wait for log flush, and then mark X as the new majority commit level. Thereafter, we cannot commit any operation with a timestamp equal to or less than X.

            Assignee:
            Michael Cahill (Inactive)
            Reporter:
            Michael Cahill (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: