Uploaded image for project: 'Drivers'
  1. Drivers
  2. DRIVERS-1853

Clustered Indexes for all Collections

    • Needed
    • Hide

      We're planning to add new fields to command responses and change output of listIndexes. Scope will have more details.

      Show
      We're planning to add new fields to command responses and change output of listIndexes. Scope will have more details.
    • Hide

      Drivers will need to:

      • add the clusteredIndex option for createCollection
      • add the clustered field in the output of listIndexes
      • sync collection-management tests to b042e47

      Update: serverless: forbid was added in this commit

      Show
      Drivers will need to: add the clusteredIndex option for createCollection add the clustered field in the output of listIndexes sync collection-management tests to b042e47 Update: serverless: forbid was added in this commit
    • $i18n.getText("admin.common.words.hide")
      Key Status/Resolution FixVersion
      CDRIVER-4359 Fixed 1.22.0, 1.22.0-beta0
      CXX-2491 Done 3.8.0
      CSHARP-4141 Fixed 2.16.0
      GODRIVER-2383 Done 1.10.0, 1.10.0-beta1
      JAVA-4576 Fixed 4.7.0
      NODE-4189 Fixed 4.6.0
      MOTOR-935 Won't Do
      PYTHON-3227 Done
      PHPLIB-843 Fixed 1.13.0-beta1, 1.13.0
      RUBY-2959 Fixed 2.18.0
      RUST-1271 Fixed 2.3.0
      SWIFT-1546 Won't Fix
      $i18n.getText("admin.common.words.show")
      #scriptField, #scriptField *{ border: 1px solid black; } #scriptField{ border-collapse: collapse; } #scriptField td { text-align: center; /* Center-align text in table cells */ } #scriptField td.key { text-align: left; /* Left-align text in the Key column */ } #scriptField a { text-decoration: none; /* Remove underlines from links */ border: none; /* Remove border from links */ } /* Add green background color to cells with FixVersion */ #scriptField td.hasFixVersion { background-color: #00FF00; /* Green color code */ } #scriptField td.willNotDo { background-color: #FF0000; /* Red color code */ } /* Center-align the first row headers */ #scriptField th { text-align: center; } Key Status/Resolution FixVersion CDRIVER-4359 Fixed 1.22.0, 1.22.0-beta0 CXX-2491 Done 3.8.0 CSHARP-4141 Fixed 2.16.0 GODRIVER-2383 Done 1.10.0, 1.10.0-beta1 JAVA-4576 Fixed 4.7.0 NODE-4189 Fixed 4.6.0 MOTOR-935 Won't Do PYTHON-3227 Done PHPLIB-843 Fixed 1.13.0-beta1, 1.13.0 RUBY-2959 Fixed 2.18.0 RUST-1271 Fixed 2.3.0 SWIFT-1546 Won't Fix

      Downstream Change Summary

      We're planning to add new fields to command responses and change output of listIndexes. Scope will have more details.

      Description of Linked Ticket

      Epic Summary

      Summary

      Without clustering, a collection is stored in a B-Tree by a RecordId that is not exposed to end users, and there is a primary key index (<primary key>, <RecordId>). With clustering, a collection is to be stored in a B-Tree by the collection’s primary key, and there is no primary key index. This project is a generalization of clustering for time series (PM-288), and will need to support upgrading existing collections to use clustering.

      Motivation

      Clustering by primary key is important for fast scale in/out in Serverless. This is largely because split and merge, which will do a physical copy such as file copy, will replace tenant migration/chunk migration, which does a logical copy.

      • If a tenant does not have local secondary indexes (e.g., only has global indexes), orphan cleanup can be done using truncate rather than individual document deletes. Orphan filtering is expensive, so fast orphan cleanup is particularly important when doing a physical copy. This is because with a logical copy, the recipient can only end up with orphans in the range being transferred, but with a physical copy, the recipient can end up with orphans outside the range being transferred (i.e., more orphans). Orphans also block the merge of two slices that were split from each other, since merge has to be on disjoint ranges.
      • WT data tables for disjoint primary key ranges can be presented as a single table in constant time, for example by adding a root node above the two tables. This can significantly speed up merge, especially if combined with providing a union-view over any local secondary index tables. The tables can actually be merged into one file in the background.

      General benefits of clustering include:

      • Faster lookup and range scans by primary key because you don't need to go through the primary key index.
      • Faster orphan filtering for covered local index queries because local index entries contain the primary key.

      One downside is clustering may consume more space if there are local secondary indexes, since the primary key index reduces the number of copies of each primary key value

      Cast of Characters

      Documentation

      Product Description
      Scope Document
      Technical Design Document

            Assignee:
            abraham.egnor@mongodb.com Abraham Egnor
            Reporter:
            backlog-server-pm Backlog - Core Eng Program Management Team
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: