-
Type: Improvement
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Aggregation Framework, Querying
-
Query Optimization
-
Query Optimization 2021-05-17
This is an extension of SERVER-9507, which allowed $group to be executed using a DISTINCT_SCAN stage. A distinct command over a multikey field can DISTINCT_SCAN a multikey index:
> db.c.drop() true > db.c.createIndex({a: 1}) { "createdCollectionAutomatically" : true, "numIndexesBefore" : 1, "numIndexesAfter" : 2, "ok" : 1 } > db.c.insert({a: [1, 2, 3]}) WriteResult({ "nInserted" : 1 }) > db.c.insert({a: [2, 3, 4]}) WriteResult({ "nInserted" : 1 }) > db.c.explain().distinct("a") { "queryPlanner" : { "plannerVersion" : 1, "namespace" : "test.c", "indexFilterSet" : false, "parsedQuery" : { }, "queryHash" : "840B4B0E", "winningPlan" : { "stage" : "PROJECTION", "transformBy" : { "_id" : 0, "a" : 1 }, "inputStage" : { "stage" : "DISTINCT_SCAN", "keyPattern" : { "a" : 1 }, "indexName" : "a_1", "isMultiKey" : true, "multiKeyPaths" : { "a" : [ "a" ] }, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 2, "direction" : "forward", "indexBounds" : { "a" : [ "[MinKey, MaxKey]" ] } } }, "rejectedPlans" : [ ] }, "serverInfo" : { "host" : "storchbox", "port" : 27017, "version" : "0.0.0", "gitVersion" : "unknown" }, "ok" : 1 }
When the same distinct operation is expressed using the aggregation framework, however, the $unwind inhibits the DISTINCT_SCAN optimization added in SERVER-9507:
> db.c.explain().aggregate([{$unwind: {path: "$a", preserveNullAndEmptyArrays: true}}, {$group: {_id: "$a"}}]) { "stages" : [ { "$cursor" : { "query" : { }, "fields" : { "a" : 1, "_id" : 0 }, "queryPlanner" : { "plannerVersion" : 1, "namespace" : "test.c", "indexFilterSet" : false, "parsedQuery" : { }, "queryHash" : "8B3D4AB8", "winningPlan" : { "stage" : "COLLSCAN", "direction" : "forward" }, "rejectedPlans" : [ ] } } }, { "$unwind" : { "path" : "$a", "preserveNullAndEmptyArrays" : true } }, { "$group" : { "_id" : "$a" } } ], "ok" : 1 }
This task is to add logic in order to optimize $unwind-$group pipelines to use DISTINCT_SCAN when appropriate. It is closely related to SERVER-27494, which describes how $unwind can sometimes be optimized away even without generating a DISTINCT_SCAN plan.
- is depended on by
-
SERVER-68567 distinct command on the Time-Series collection doesn't utilize the existing index
- Backlog
- is related to
-
SERVER-27494 Avoid unwind on multi-key index field for covered index aggregation
- Backlog
-
SERVER-9507 Optimize $sort+$group+$first pipeline to avoid full index scan
- Closed
- related to
-
SERVER-87741 Make distinct command on views use DISTINCT_SCAN
- Backlog
-
SERVER-55112 Behaviour of distinct differs between collections and views
- Closed
-
SERVER-27915 Make $group with $addToSet accumulator use DISTINCT_SCAN when applicable
- Backlog