Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Won't Do
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Aggregation Framework
Labels:
- neweng
- optimization

Assigned Teams:

Query Optimization
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

The optimized code path that uses a random cursor to provide a $sample stage currently unconditionally appends a FETCH stage on top of the index scan: https://github.com/mongodb/mongo/blob/r3.4.0-rc2/src/mongo/db/pipeline/pipeline_d.cpp#L253-L254

This is unnecessary in cases like this where we only need the _id to answer the aggregation:

> db.foo.drop();
true
> for (var i = 0; i < 10000; i++) { db.foo.insert({_id: i}); }
WriteResult({ "nInserted" : 1 })
> db.foo.explain().aggregate([{$sample: {size: 10}}, {$bucketAuto: {groupBy: "$_id", buckets: 2}}])
{
	"stages" : [
		{
			"$cursor" : {
				"query" : {
					
				},
				"fields" : {
					"_id" : 1
				},
				"queryPlanner" : {
					"plannerVersion" : 1,
					"namespace" : "test.foo",
					"indexFilterSet" : false,
					"winningPlan" : {
						"stage" : "FETCH",  // This FETCH stage is not necessary.
						"inputStage" : {
							"stage" : "INDEX_ITERATOR"
						}
					},
					"rejectedPlans" : [ ]
				}
			}
		},
		{
			"$sampleFromRandomCursor" : {
				"size" : NumberLong(10)
			}
		},
		{
			"$bucketAuto" : {
				"groupBy" : "$_id",
				"buckets" : 2,
				"output" : {
					"count" : {
						"$sum" : {
							"$const" : 1
						}
					}
				}
			}
		}
	],
	"ok" : 1,
	"operationTime" : Timestamp(0, 0)
}

This optimization is valid if the Pipeline either has no dependencies, or if the only dependency is the _id. To check this, we'll need to move this dependency calculation up to before the handling of a $sample stage. If the only dependency is the _id, we'll need to add a PROJECTION stage to transform the index key into a full-blown document.

Assignee:: [DO NOT USE] Backlog - Query Optimization
Reporter:: Charlie Swanson
Participants:: [DO NOT USE] Backlog - Query Optimization, Charlie Swanson, David Storch
Votes:: 0 Vote for this issue
Watchers:: 7 Start watching this issue

Created:: Dec 12 2016 09:00:45 PM UTC
Updated:: Jan 19 2023 07:47:59 PM UTC
Resolved:: Jan 19 2023 07:47:59 PM UTC

Details

Description

Attachments

Activity

People

Dates