-
Type: Improvement
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Aggregation Framework
-
Fully Compatible
-
Query 2017-05-08, Query 2017-05-29, Query 2019-10-07, Query 2019-10-21, Query 2019-11-04
-
(copied to CRM)
-
15
In some situations, an aggregation pipeline with a sort will attempt to use an index to cover the sort. In such situations, the aggregation pipeline will always favor plans that do not have a blocking sort, even if other plans may be faster.
We should somehow integrate the aggregation framework and the query system more tightly so that we can consider both blocking and non-blocking plans at the same time, or we should try to otherwise determine if the blocking plan would outperform the non-blocking plan.
Original Description
We have a dataset with some hundred thousand documents with aggregation framework we select a small amount of it and sort it by _id. Match works very fast (by index) but sort {_id: 1} seems to load all data instead of sorting only selected documents. If we sort by some other field it works like expected.
function test(num) { function time(fun) { var start = (new Date).getTime(); fun.apply(); return (new Date).getTime() - start; } if (num === undefined) { num = 1000; } db.demo.remove(); for (var i = 0; i < num; ++i) { db.demo.insert({num: i}); } db.demo.ensureIndex({num: 1}); var query = {num: {$lt: 10}}; var sortId = {_id: 1}; var sortNum = {num: 1}; printjson(db.demo.find(query).explain()); print("find sort by id: " + time(function () { db.demo.find(query).sort(sortId); })); print("aggregate sort by id: " + time(function () { db.demo.aggregate([{$match: query}, {$sort: sortId}]); })); print("find sort by num: " + time(function () { db.demo.find(query).sort(sortNum); })); print("aggregate sort by num: " + time(function () { db.demo.aggregate([{$match: query}, {$sort: sortNum}]); })); }
test(100000) { "cursor" : "BtreeCursor num_1", "isMultiKey" : false, "n" : 10, "nscannedObjects" : 10, "nscanned" : 10, "nscannedObjectsAllPlans" : 10, "nscannedAllPlans" : 10, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 0, "nChunkSkips" : 0, "millis" : 0, "indexBounds" : { "num" : [ [ -1.7976931348623157e+308, 10 ] ] }, "server" : "mongodb01:27228", "millis" : 0 } find sort by id: 0 aggregate sort by id: 373 find sort by num: 0 aggregate sort by num: 1
- is depended on by
-
SERVER-26442 Push $sort before $project and $addFields
- Open
- is duplicated by
-
SERVER-27250 Aggregation Pipeline Favors Indexes for $sort phase over $match phase
- Closed
-
SERVER-21471 Bad index selection on aggregate with $match + $sort
- Closed
-
SERVER-26337 performance problem with $limit in aggregate query
- Closed
-
SERVER-27076 Query Planner uses inefficient plan for regular expression + sort with aggregations
- Closed
-
SERVER-27150 Performance degradation with $sort on indexed field
- Closed
-
SERVER-30696 $sort in aggregate is super slow
- Closed
-
SERVER-32186 Query Planner selects wrong index
- Closed
-
SERVER-44309 Not full index use in aggregation pipelines
- Closed
-
SERVER-43221 Applying index makes aggregation query go significantly slower
- Closed
- is related to
-
SERVER-31051 Count Is very very slow using aggregation frame work.
- Closed
-
SERVER-7944 add index hint support for operations that read indexes
- Closed
-
SERVER-24860 Optimize away entire pipeline if it can be answered using a query
- Closed
- related to
-
SERVER-21783 aggregation framework indexed date(IsoDate) field sorting bug
- Closed
-
SERVER-43816 Push $text and $sort with $meta in the sort pattern into the PlanStage layer
- Closed
-
SERVER-44309 Not full index use in aggregation pipelines
- Closed