-
Type: New Feature
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Query Optimization
Inspired by some similar capability in ElasticSearch: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html#run-sub-aggs
Here's an example. Data set:
{_id: 1, party: "dem", state: "NY", metric: 10} {_id: 2, party: "dem", state: "NY", metric: 14} {_id: 3, party: "rep", state: "PA", metric: 16} {_id: 4, party: "dem", state: "PA", metric: 12} {_id: 5, party: "rep", state: "NY", metric: 8} {_id: 6, party: "dem", state: "PA", metric: 9}
I could run this aggregation:
{ $group: { _id: "$state", stateTotalMetric: {$sum: "$metric"}, count: {$sum: 1}, byParty: { $group: { _id: "$party", count: {$sum: 1}, totalMetric: {$sum: "$metric"} } } } }
And I would get this result:
{ _id: "NY", stateTotalMetric: 32, count: 3, byParty: [ {_id: "dem", count: 2, totalMetric: 24}, {_id: "rep", count: 1, totalMetric: 8} ] } { _id: "PA", stateTotalMetric: 37, count: 3, byParty: [ {_id: "dem", count: 2, totalMetric: 21}, {_id: "rep", count: 1, totalMetric: 16} ] }
This computation is obviously expressible today, but here's how you would have to do it:
{ $group: { _id: {state: "$state", party: "$party}, count: {$sum: 1}, totalMetric: {$sum: "$metric"} } }, { $group: { _id: "$_id.state", stateTotalMetric: {$sum: "$totalMetric"}, count: {$sum: "$count"}, byParty: { $push: { _id: "$_id.party", count: "$count", totalMetric: "$totalMetric" } } } }
Obviously, this is longer and more cumbersome. It also gets worse and more error prone if I had used $avg instead of $sum, where it would be incorrect or at least very different to take an average of averages, and I would have to track counts and sums independently in the first $group.
- is related to
-
SERVER-66701 Optimize $addToSet accumulators into $group stages where possible
- In Progress
-
SERVER-29930 allow nested $facet in another $facet of aggregate
- Backlog
- related to
-
SERVER-62405 Partially-streaming compound $sort
- Backlog