-
Type: Improvement
-
Resolution: Won't Do
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Aggregation Framework
-
Query Optimization
The optimized code path that uses a random cursor to provide a $sample stage currently unconditionally appends a FETCH stage on top of the index scan: https://github.com/mongodb/mongo/blob/r3.4.0-rc2/src/mongo/db/pipeline/pipeline_d.cpp#L253-L254
This is unnecessary in cases like this where we only need the _id to answer the aggregation:
> db.foo.drop(); true > for (var i = 0; i < 10000; i++) { db.foo.insert({_id: i}); } WriteResult({ "nInserted" : 1 }) > db.foo.explain().aggregate([{$sample: {size: 10}}, {$bucketAuto: {groupBy: "$_id", buckets: 2}}]) { "stages" : [ { "$cursor" : { "query" : { }, "fields" : { "_id" : 1 }, "queryPlanner" : { "plannerVersion" : 1, "namespace" : "test.foo", "indexFilterSet" : false, "winningPlan" : { "stage" : "FETCH", // This FETCH stage is not necessary. "inputStage" : { "stage" : "INDEX_ITERATOR" } }, "rejectedPlans" : [ ] } } }, { "$sampleFromRandomCursor" : { "size" : NumberLong(10) } }, { "$bucketAuto" : { "groupBy" : "$_id", "buckets" : 2, "output" : { "count" : { "$sum" : { "$const" : 1 } } } } } ], "ok" : 1, "operationTime" : Timestamp(0, 0) }
This optimization is valid if the Pipeline either has no dependencies, or if the only dependency is the _id. To check this, we'll need to move this dependency calculation up to before the handling of a $sample stage. If the only dependency is the _id, we'll need to add a PROJECTION stage to transform the index key into a full-blown document.