-
Type: Improvement
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Aggregation Framework
-
None
-
Query
$group seems to be a blocking stage regardless of the conditions. Trivial groupings on keys which are already sorted can avoid blocking to some degree and definitely use less memory.
For example, using the "tweets" data set from the University with an index on "user.screen_name":
>db.tweets.aggregate( [ { $sort: { "user.screen_name": 1 } }, { $group: { _id: "$user.screen_name", tweets: { $push: "$$CURRENT" } } } ])
Fails with:
"Exceeded memory limit for $group, but didn't allow external sort. Pass allowDiskUse:true to opt in."
The results going in to the group are already sorted. Every new value consumed indicates the previous value can never be seen again – the previous value bucket could be emitted and the group begin a new bucket with no need to block.
Notably this would reduce the memory footprint of the $group stage to prevent it, in these cases, from ever exceeding the limit.
- duplicates
-
SERVER-4507 aggregation: optimize $group to take advantage of sorted sequences
- Backlog