-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
Query Optimization
-
Fully Compatible
-
ALL
-
QO 2024-02-05, QO 2024-02-19, QO 2024-03-04
-
110
If I sort by {a: -1} and use $first, I should get the largest 'a' value within each group.
For example, using $bucket I get the expected result:
> db.c.find() { "_id" : 0, "a" : 0 } { "_id" : 1, "a" : 1 } { "_id" : 2, "a" : 2 } { "_id" : 3, "a" : 3 } { "_id" : 4, "a" : 4 } { "_id" : 5, "a" : 5 } { "_id" : 6, "a" : 6 } { "_id" : 7, "a" : 7 } { "_id" : 8, "a" : 8 } { "_id" : 9, "a" : 9 } > db.c.aggregate([ {$sort: {a: -1}}, {$bucket: {groupBy: "$a", boundaries: [0, 5, 10], output: {a: {$first: "$a"}} }} ]) { "_id" : 0, "a" : 4 } { "_id" : 5, "a" : 9 }
While the same thing with $bucketAuto gives the smallest value within each group:
> db.c.aggregate([ {$sort: {a: -1}}, {$bucketAuto: {groupBy: "$a", buckets: 2, output: {a: {$first: "$a"}} }} ]) { "_id" : { "min" : 0, "max" : 5 }, "a" : 0 } { "_id" : { "min" : 5, "max" : 9 }, "a" : 5 }
It looks like internally $bucketAuto is doing something like:
- stable-sort by the group key
- scan the sorted input
- accumulate each document into the current bucket
- start a new bucket every N documents (based on input size / num buckets)
But this doesn’t work for order-sensitive accumulators like $first, because the sort loses some information about the original order--even using a stable sort. The documents within each dynamically-chosen bucket are sorted by (group key, input position), instead of by input position.
- is depended on by
-
SERVER-81571 Reconsider stable sort in sorter.cpp
- Closed
- related to
-
SERVER-85424 $group relies on stable sorting
- Closed
-
SERVER-87459 $bucketAuto $mergeObject does not obey input order
- Closed
-
SERVER-81571 Reconsider stable sort in sorter.cpp
- Closed