Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-11500

Investigate the possibility of a table that doesn't support timestamps and also isn't logged

    • Type: Icon: New Feature New Feature
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Storage Engines
    • StorEng - Refinement Pipeline

      Note this is a question I had for the WiredTiger team about how we could POC a configuration for a WT table / btree that is not logged and also doesn't have timestamp writes. Filing a ticket here at Alex's request so we can preserve any history/context around this.

      My understanding today is that for historical reasons, logged tables are assumed to be untimestamped and have commit-level durability, whereas not-logged tables should have checkpoint-level durability and support timestamps. In the server, we have an interesting use-case for a not-logged table that is written to in the same WT transaction as record-stores/timestamped tables, but doesn't itself require timestamp-versioning (MongoDB only ever reads the most-recent version of any record in this table).
      Since this table is written to a lot/records are frequently updated, we had a theory that writing the update-chains to the history store for these documents (whose historical versions we'd never need) was expensive and costing us throughput (via i.e. making checkpointing more expensive). So we experimented with making the table a logged-table to exempt it from durable history. There were some potentially promising results where write-heavy workloads with multiple threads had some throughput improvement. However there may be a small regression in single-threaded workloads, which makes sense as we're now journaling this data (and in the single-thread case the overhead of the flushing the journaling is less 'amortized' over many threads).
      Since checkpoint-level durability is actually fine for this table (i.e. we don't actually need the journal for it, we can get the table back to an acceptable state after recovery-from-checkpoint using the oplog), we were thinking about how we could avoid journaling this table and making it untimestamped. Unfortunately, it's difficult at the MongoDB layer to untimestamp the writes for this table, as they occur in the same WT txn as timestamped writes to other tables. So we were exploring the possibility of letting a table have the special 'configured' behavior of being untimestamped in the way logged tables are, but without doing any actual logging.
      Looking at the code, we thought that we could avoid timestamping the updates to this table if we added a special flag to the btree for this configuration, and then in __wt_txn_op_set_timestamp here: https://github.com/wiredtiger/wiredtiger/blob/3a816c38e88cdc310afa9d331316389102ed01dc/src/include/txn_inline.h#L372-L374 adding an early-return if the flag is set to avoid copying timestamps into updates for the btree for this table. Would that work/achieve our goal of making sure no history is kept for this table and also avoid journaling? We're just looking for a quick-and-dirty hack to test this theory. Let me know if there are other ideas or if this sounds reasonable or if I've misunderstood the code completely.

            Assignee:
            backlog-server-storage-engines [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            george.wangensteen@mongodb.com George Wangensteen
            Votes:
            0 Vote for this issue
            Watchers:
            20 Start watching this issue

              Created:
              Updated: