Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-75604

Eliminate CollectionScanNode.filter when not needed for clustered collection scans

    • Query Optimization
    • Fully Compatible
    • QO 2023-10-16, QO 2023-10-30, QO 2023-11-13, QO 2023-11-27

      A QuerySolutionNode.filter is always generated for clustered collection scans if the bounds are from expressions, apparently solely to distinguish < from <= and > from >=. In these cases, the scan does a bounds-inclusive scan, then the filter eliminates any records for bounds that are actually exclusive.

      For example, a query like this against a clustered collection always generates a filter:

      db.ni.find({$and: [{_id: {$gt: 1}}, {_id: {$lt: 3}}]})
      

      However, if the bounds were specified via the "min" (always inclusive) and "max" (always exclusive) options, the plan does not generate a filter, and the scan operator is expected to enforce the correct bounds itself. For example, a query like the following against a clustered collection does NOT generate a filter:

      db.ni.find().min({_id: 1}).max({_id: 2}).hint({_id: 1})
      

      Given that it is trivially easy to enforce the correct bounds inside the scan operator, and it is already responsible for doing so for the min-max case, the optimizer should stop generating collection scan filters that exist solely for scan bound inclusive vs exclusive enforcement.

      This optimization may also be applicable to index scans that have been decomposed into one or more intervals.

      The scan operator will need to know whether the lower and upper bounds are inclusive or exclusive. CollectionScanParams (collection_scan_common.h) has a type that is used in plan nodes to indicate this, although it is a bit hard to consume:

          enum class ScanBoundInclusion {
              kExcludeBothStartAndEndRecords,
              kIncludeStartRecordOnly,
              kIncludeEndRecordOnly,
              kIncludeBothStartAndEndRecords,
          };
      

      It would be easier to consume if it were just two booleans like

      // A scan bound is exclusive if the respective flag is false and inclusive it it is true.
      bool scanLowerBoundInclusive;
      bool scanUpperBoundInclusive; 
      

      Whether booleans or the existing enum are used, it needs to be ensured these are parameterized with the SBE plan cache so that cached plans do not have permanently baked-in information on inclusive vs exclusive but instead can be correctly reused at runtime for queries that have different bounds. (I do not know if this is already the case with the CollectionScanParams::ScanBoundfInclusion CollectionScanNode.boundInclusion parameter.)

      FYI david.storch@mongodb.comhana.pearlman@mongodb.comamr.elhelw@mongodb.com

            Assignee:
            james.harrison@mongodb.com James Harrison
            Reporter:
            kevin.cherkauer@mongodb.com Kevin Cherkauer
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: