Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-66072

$match sampling and $group aggregation strange behavior

    • Fully Compatible
    • ALL
    • v6.0, v5.3, v5.0
    • QE 2022-06-13, QO 2022-07-11, QO 2022-07-25

      I'm using mongodb aggregation pipeline with $sampleRate in order to improve my query performances. I felt on a strange behavior i don't understand ...

      Here is my aggregation pipeline running on a big collection (1M+ documents) :


              '$match': {
                publishedAt: {
                  '$gt': new Date('2021-04-27T22:00:00.000Z'),
                  '$lt': new Date('2022-04-28T21:59:59.999Z')
                //... some other matching fields
              '$group': {
                _id: {
                  keyWords: '$keyWords', // This is an Array<String>
                  //... some other fields
                first: { '$first': '$$CURRENT' }
            { '$match': { '$sampleRate': 0.25 } }, // This is where i do my sampling
            { '$replaceRoot': { newRoot: '$first' } },
              '$project': {
                _id: true,
                //... some other fields

      When i do this i get approximately two times more documents than when i inverse the $replaceRoot and $sampleRate steps =>


              '$match': {
                publishedAt: {
                  '$gt': new Date('2021-04-27T22:00:00.000Z'),
                  '$lt': new Date('2022-04-28T21:59:59.999Z')
                //... some other matching fields
              '$group': {
                _id: {
                  keyWords: '$keyWords', // This is an Array<String>
                  //... some other fields
                first: { '$first': '$$CURRENT' }
            { '$replaceRoot': { newRoot: '$first' } },
            { '$match': { '$sampleRate': 0.25 } }, // This is where i do my sampling
              '$project': {
                _id: true,
                //... some other fields

      ... I don't understand why oO They should give the same number of documents to me.

      Do you know where i'm failing to understand ? Or is it a bug ?

      PS : I created a question here : https://stackoverflow.com/questions/72048023/mongodb-aggregate-pipeline-sampling-fail

            alya.berciu@mongodb.com Alya Berciu
            cjbjohan.maupetit@laposte.net Johan Maupetit
            0 Vote for this issue
            13 Start watching this issue
