Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Aggregation Framework, Querying
Labels:
- QFB
- asya
- performance

Assigned Teams:

Query Optimization
Sprint:
Query Optimization 2021-05-17
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

This is an extension of ~~SERVER-9507~~, which allowed $group to be executed using a DISTINCT_SCAN stage. A distinct command over a multikey field can DISTINCT_SCAN a multikey index:

> db.c.drop()
true
> db.c.createIndex({a: 1})
{
	"createdCollectionAutomatically" : true,
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"ok" : 1
}
> db.c.insert({a: [1, 2, 3]})
WriteResult({ "nInserted" : 1 })
> db.c.insert({a: [2, 3, 4]})
WriteResult({ "nInserted" : 1 })
> db.c.explain().distinct("a")
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.c",
		"indexFilterSet" : false,
		"parsedQuery" : {

		},
		"queryHash" : "840B4B0E",
		"winningPlan" : {
			"stage" : "PROJECTION",
			"transformBy" : {
				"_id" : 0,
				"a" : 1
			},
			"inputStage" : {
				"stage" : "DISTINCT_SCAN",
				"keyPattern" : {
					"a" : 1
				},
				"indexName" : "a_1",
				"isMultiKey" : true,
				"multiKeyPaths" : {
					"a" : [
						"a"
					]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"a" : [
						"[MinKey, MaxKey]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		"host" : "storchbox",
		"port" : 27017,
		"version" : "0.0.0",
		"gitVersion" : "unknown"
	},
	"ok" : 1
}

When the same distinct operation is expressed using the aggregation framework, however, the $unwind inhibits the DISTINCT_SCAN optimization added in ~~SERVER-9507~~:

> db.c.explain().aggregate([{$unwind: {path: "$a", preserveNullAndEmptyArrays: true}}, {$group: {_id: "$a"}}])
{
	"stages" : [
		{
			"$cursor" : {
				"query" : {

				},
				"fields" : {
					"a" : 1,
					"_id" : 0
				},
				"queryPlanner" : {
					"plannerVersion" : 1,
					"namespace" : "test.c",
					"indexFilterSet" : false,
					"parsedQuery" : {

					},
					"queryHash" : "8B3D4AB8",
					"winningPlan" : {
						"stage" : "COLLSCAN",
						"direction" : "forward"
					},
					"rejectedPlans" : [ ]
				}
			}
		},
		{
			"$unwind" : {
				"path" : "$a",
				"preserveNullAndEmptyArrays" : true
			}
		},
		{
			"$group" : {
				"_id" : "$a"
			}
		}
	],
	"ok" : 1
}

This task is to add logic in order to optimize $unwind-$group pipelines to use DISTINCT_SCAN when appropriate. It is closely related to SERVER-27494, which describes how $unwind can sometimes be optimized away even without generating a DISTINCT_SCAN plan.

is depended on by

SERVER-68567 distinct command on the Time-Series collection doesn't utilize the existing index

Backlog

is related to

SERVER-27494 Avoid unwind on multi-key index field for covered index aggregation

Backlog

SERVER-9507 Optimize $sort+$group+$first pipeline to avoid full index scan

Closed

related to

SERVER-55112 Behaviour of distinct differs between collections and views

Closed

SERVER-27915 Make $group with $addToSet accumulator use DISTINCT_SCAN when applicable

Backlog

SERVER-87741 Make distinct command on views use DISTINCT_SCAN

Backlog

(1 related to)

Assignee:: [DO NOT USE] Backlog - Query Optimization
Reporter:: David Storch
Participants:: [DO NOT USE] Backlog - Query Optimization, Charlie Swanson, David Storch
Votes:: 0 Vote for this issue
Watchers:: 11 Start watching this issue

Created:: Oct 23 2018 03:51:05 PM UTC
Updated:: Jul 22 2024 07:48:34 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates