-
Type: Improvement
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: 2.6.1
-
Component/s: Performance
-
None
Seems that some $group queries that can be made using index-only scans, are performing unnecessary full table scans.
Here are more details, As I originally described in http://stackoverflow.com/questions/24980525/mongodb-slow-group-performance.
I have a MongoDB collection of over 1,000,000 records. Each record size is around 20K (so the total collection size is around 20GB).
I have a 'type' field in the collection (that can have around 10 different values). I would like to get the per-type counters for the collection. Also, there is an index on the 'type' field.
I've tested two different approaches (assume python syntax):
- A naive method - using 'count' calls for each of the values:
for type_val in my_db.my_colc.distinct('type'): counters[type_val] = my_db.my_colc.find({'type' : type_val}).count()
- Using aggregation framework with a '$group' syntax:
counters = my_db.my_colc.aggregate([{'$group' : {'_id': '$type', 'agg_val': { '$sum': 1 } }}])
The performance I'm receiving for the first approach is about 2 orders of magnitude faster than for the 2nd approach. Seems to be related to the fact that count runs on the indices only, without accessing the documents, while $group has to go over the documents one-by-one. (It's about 1min vs. 45mins).
Is there any way to run an efficient grouping query on the 'type' index, that would use only the index, thus achieving the performance results from #1, but using the aggregation framework ?
- duplicates
-
SERVER-11447 aggregation can sort using index to speed up group of an indexed field
- Closed
- related to
-
SERVER-9507 Optimize $sort+$group+$first pipeline to avoid full index scan
- Closed