Uploaded image for project: 'Drivers'
  1. Drivers
  2. DRIVERS-1492

Clustered indexes for Time-series collections

    • Type: Icon: Epic Epic
    • Resolution: Won't Do
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Component/s: None
    • Clustered indexes for Time-series collections
    • Needed

      Downstream Change Summary

      There may be downstream impacts for this project. We will update with potential impacts after we move to design.

      Description of Linked Ticket

      Epic Summary

      Make the RecordStore for a collection a mapping from _id key to BSON document, instead of a mapping from RecordId to BSON document. This will then allow us to remove the separate _id index.

      Motivation

      • Queries that currently use the _id index will only need to do one read instead of two.
      • Inserts and deletes also have one less index to update.
      • Range queries on _id will be able to use a collection scan rather than index scan with fetch stage. This allows for far more efficient sequential storage access where random order access is required now.
      • Range removes on _id allow for efficient truncations that avoid reading data to be removed if there are no other indexes.
      • Significantly speed up chunk migrations on sharded clusters.
      • Improve usability of MongoDB for timeseries data.

      Doing a smaller internal-only project first, that leaves full support for sharded collections with shard keys other than _id as future work, helps evaluating performance gains before committing to implementing the more expensive future work, and allows answering design questions for that work.

            Assignee:
            Unassigned Unassigned
            Reporter:
            backlog-server-pm Backlog - Core Eng Program Management Team
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: