Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 8.2.0-rc0
Affects Version/s: None
Component/s: Querying
Labels:
- qopt-team

Assigned Teams:

Query Optimization
Backwards Compatibility:
Fully Compatible
Sprint:
QO 2022-09-19
Case:
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

The DISTINCT_SCAN stage implements an optimized specialization of an index scan which is appropriate for a subset of distinct or aggregate operations. It involves skipping duplicate keys via an index seek.

The planning logic for DISTINCT_SCAN is implemented outside of the planner. It involves first invoking QueryPlanner::plan(), and then seeing if any of the resulting plans can be correctly converted to a DISTINCT_SCAN. However, as soon as we are able to construct our first DISTINCT_SCAN plan, we pass it off to the execution engine without considering other candidates. See https://github.com/mongodb/mongo/blob/48cd578fa9c3ef317666ca475f9ee14c1fe0bc4f/src/mongo/db/query/get_executor.cpp#L1535-L1556. It is possible that there are multiple DISTINCT_SCAN plans, and that one will outperform another. The efficiency of the DISTINCT_SCAN relates to position of the field we're "distincting" in the index key pattern, as well as the number of unique values in the collection for the preceding key pattern fields. By simply selecting the first DISTINCT_SCAN, we might select a plan that is substantially suboptimal.

Instead, we should generate a set of DISTINCT_SCAN candidate plans. These candidates could then be scored and ranked according to our usual multi-planning algorithm.

A few additional concerns that come to mind:

Is there any reason to generate IXSCAN candidates alongside the DISTINCT_SCAN candidates, or is it safe to assume that DISTINCT_SCAN should always be preferred?
How hard would it be to move the DISTINCT_SCAN logic into QueryPlanner::plan()? It's always bothered me that we have such heavy query planning logic living outside of the planning module.

depends on

SERVER-72748 Enable feature flag

Closed

is related to

SERVER-22460 Logic for applying distinct scan and count scan optimizations should be moved to data access planner

Backlog

SERVER-14227 add index hint support for distinct command

Closed

related to

SERVER-92456 Multiplan DISTINCT_SCANs

Closed

Assignee:: Alya Berciu
Reporter:: David Storch
Participants:: Alya Berciu, Brenda Rodriguez, Charlie Swanson, David Storch, Joel Redman
Votes:: 1 Vote for this issue
Watchers:: 29 Start watching this issue

Created:: Jan 21 2020 11:51:37 PM UTC
Updated:: Apr 22 2025 08:17:46 AM UTC
Resolved:: Apr 22 2025 08:17:45 AM UTC
Confidence Status Last Update:: 16/Mar/22 6:45 PM

Details

Description

Attachments

Issue Links

Activity

People

Dates