Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-1282

Enhance LSM trees to adapt to overflow sizes

    • Type: Icon: New Feature New Feature
    • Resolution: Won't Do
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None

      In thinking about the page size issue and overflow items and the need to get that right on a server in the face of unknown future inserts from a client and performance impact of overflow items, it occurred to me that maybe WT could be adaptive in that respect - at least LSM trees. I know enough to be dangerous, make this sound good but not know the edge cases that may make it impossible.

      Each LSM chunk is really its own btree. Is it necessary that each chunk be configured the same? Perhaps if LSM notices that there are a lot of overflow items or pages being created it could trigger the next chunk switch to increase the leaf_page_max of the underlying chunk to get all of the new inserts on-page.

      Random thoughts:

      • I said, "if LSM notices that there are a lot of overflow items..." I suspect two harder parts are "notice" (harder) and defining "a lot" (easier).
      • We'd probably want a config to allow this
      • If we were to do this, I think simplest would be a one-way path to go larger only. We'd want to know how large to go immediately, not just use a simplistic calculation like doubling.
      • Other than stats, is there any way for LSM to know an insert results in overflow items?
      • Currently creating a chunk uses lsm_tree->file_config. This is probably the main problem area. The lsm_tree metadata would clearly need to be updated when increasing page size. Each existing chunk already has its own metadata written so we would know each individual chunk's configuration. Is there a problem or assumptions if an older chunk's configuration doesn't match the current lsm_tree's config?
      • I think merging should just work as it is just walking using cursor->next() and the new destination chunk would be the larger page size.

            Assignee:
            backlog-server-storage-engines [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            sue.loverso@mongodb.com Susan LoVerso
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: