Originally we thought it was fine to use min(largest committed timestamp, all active timestamps) for the all_committed timestamp. However, it would be more useful to return:
min(largest committed timestamp, active timestamp - 1)
This behavior will be useful on one-voting-node replica sets. With such sets, the primary node could immediately set the majority commit level after every write is durable, since a majority of 1 is 1. Unfortunately, because writes commit out of timestamp order, it means they can become durable out of timestamp order. We need to be able to set the majority level according to the latest durable timestamp that has no uncommitted operations with timestamps less than it. The all_committed value can help provide this: in a thread loop, we can query what the all_committed value X is, wait for log flush, and then mark X as the new majority commit level. Thereafter, we cannot commit any operation with a timestamp equal to or less than X.
- is depended on by
-
SERVER-33743 Use all_committed to set lastApplied on primary nodes
- Closed