-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: 5.0.8, 5.0.6
-
Component/s: Query Execution, Query Planning
-
Fully Compatible
-
ALL
-
v6.0, v5.3, v5.0
-
QE 2022-06-13, QO 2022-07-11, QO 2022-07-25
I'm using mongodb aggregation pipeline with $sampleRate in order to improve my query performances. I felt on a strange behavior i don't understand ...
Here is my aggregation pipeline running on a big collection (1M+ documents) :
[ { '$match': { publishedAt: { '$gt': new Date('2021-04-27T22:00:00.000Z'), '$lt': new Date('2022-04-28T21:59:59.999Z') }, //... some other matching fields } }, { '$group': { _id: { keyWords: '$keyWords', // This is an Array<String> //... some other fields }, first: { '$first': '$$CURRENT' } } }, { '$match': { '$sampleRate': 0.25 } }, // This is where i do my sampling { '$replaceRoot': { newRoot: '$first' } }, { '$project': { _id: true, //... some other fields } } ]
When i do this i get approximately two times more documents than when i inverse the $replaceRoot and $sampleRate steps =>
[ { '$match': { publishedAt: { '$gt': new Date('2021-04-27T22:00:00.000Z'), '$lt': new Date('2022-04-28T21:59:59.999Z') }, //... some other matching fields } }, { '$group': { _id: { keyWords: '$keyWords', // This is an Array<String> //... some other fields }, first: { '$first': '$$CURRENT' } } }, { '$replaceRoot': { newRoot: '$first' } }, { '$match': { '$sampleRate': 0.25 } }, // This is where i do my sampling { '$project': { _id: true, //... some other fields } } ]
... I don't understand why oO They should give the same number of documents to me.
Do you know where i'm failing to understand ? Or is it a bug ?
PS : I created a question here : https://stackoverflow.com/questions/72048023/mongodb-aggregate-pipeline-sampling-fail
- is caused by
-
SERVER-39938 aggregation $match before $lookup optimization doesn't happen when $expr: $eq is used
- Closed