-
Type: Improvement
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: 2.4.3
-
Component/s: Aggregation Framework
-
Fully Compatible
-
Query 2018-06-04, Query 2018-08-13, Query 2018-08-27, Query 2018-09-10, Query 2018-09-24, Query 2018-10-08
-
(copied to CRM)
This is an analogue to SERVER-2094 ("distinct cheat with indexes"), but for the aggregation framework.
This performance improvement is to allow $group operators like $first to be able to take advantage of the fact that the input to the pipeline is sorted, and thus reduce the number of index entries scanned by "skipping" processing of large portions of the pipeline.
For example, suppose a user has a collection with an index {x:1,y:1}, and that x has low cardinality. Consider the following pipeline:
db.foo.aggregate({$sort:{x:1,y:1}},{$group:{_id:{x:"$x"},y:{$first:"$y"}}})
Currently, the above pipeline will perform a full scan of the index. After this optimization, the above pipeline will only have to scan on the order of |x| index entries, which is much smaller than the size of the index.
This ticket is filed as a result of discussion in SERVER-9272 (full use case available there).
- is depended on by
-
SERVER-36517 Allow allPaths indexes to provide DISTINCT_SCAN
- Closed
- is duplicated by
-
SERVER-9272 Querying latest document based on a set of field
- Closed
-
SERVER-31269 Too many documents examined when using an index and $first/$last in $group stage
- Closed
- is related to
-
SERVER-97238 $group with $first/$last to distinct scan optimization might incorrectly unwind arrays
- In Progress
-
SERVER-69359 Aggregate query bails on DISTINCT_SCAN and uses IXSCAN
- Closed
-
SERVER-2130 Ability to use Limit() with Distinct()
- Backlog
-
SERVER-27915 Make $group with $addToSet accumulator use DISTINCT_SCAN when applicable
- Backlog
-
SERVER-2094 distinct cheat with indexes
- Closed
-
SERVER-15291 slow '$group' performance
- Closed
-
SERVER-29244 CLONE - distinct cheat with indexes
- Closed
- related to
-
SERVER-37459 $group with $$ROOT returns error
- Closed
-
SERVER-85213 Rewrite $sort+$group with $first/$last to use $top/$bottom
- Backlog
-
SERVER-4507 aggregation: optimize $group to take advantage of sorted sequences
- Backlog
-
SERVER-23732 Aggregation should optimize an irrelevant $sort preceding a $group
- Backlog
-
SERVER-28980 aggregation can subsume $sort into $group when $first/$last are present
- Backlog
-
SERVER-37715 Use DISTINCT_SCAN for $unwind-$group pipelines
- Backlog
-
SERVER-37304 Extend $sort+$group+$first pipeline optimization to $last
- Closed
-
SERVER-40090 DISTINCT_SCAN in agg is only used when certain format of _id is specified
- Closed
-
SERVER-55576 Optimize queries on time-series collections which request the most recent value
- Closed