-
Type: Investigation
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Developer Tools
-
Not Needed
While no direct user impact, there may be some cases where customers no longer see a distinct scan and thus performance may differ (but the results will be correct!). There will also two fields added to the distinct scan stage in explain output (no existing fields changed/removed)- see the design for details.
Description of Linked Ticket
Motivation
Fix a query correctness bug. As described in SERVER-42160 (and its comments), plans using DISTINCT_SCAN on sharded collections may incorrectly return orphaned documents because no SHARDING_FILTER is included.
Summary
Incorporate shard filtering into the DISTINCT_SCAN. The most difficult case is when the shard key is not part of the index being scanned; each step of the DISTINCT_SCAN must also fetch the full document and apply the shard filter to decide whether the current value is valid or is an orphan that should be skipped. Note that there is a risk here that the DISTINCT_SCAN will become slower than a regular IXSCAN in cases where there are few duplicates, so we should test that the multiplanner is trialing both options.
DISTINCT_SCAN is currently only implemented in the classic query planner and execution engine, so we'd either implement this in classic or expand the scope to implement DISTINCT_SCAN in SBE + stagebuilders.
Documentation