Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-66072

$match sampling and $group aggregation strange behavior

    • Fully Compatible
    • ALL
    • v6.0, v5.3, v5.0
    • QE 2022-06-13, QO 2022-07-11, QO 2022-07-25

      I'm using mongodb aggregation pipeline with $sampleRate in order to improve my query performances. I felt on a strange behavior i don't understand ...

      Here is my aggregation pipeline running on a big collection (1M+ documents) :

       

       [
            {
              '$match': {
                publishedAt: {
                  '$gt': new Date('2021-04-27T22:00:00.000Z'),
                  '$lt': new Date('2022-04-28T21:59:59.999Z')
                },
                //... some other matching fields
              }
            },
            {
              '$group': {
                _id: {
                  keyWords: '$keyWords', // This is an Array<String>
                  //... some other fields
                },
                first: { '$first': '$$CURRENT' }
              }
            },
            { '$match': { '$sampleRate': 0.25 } }, // This is where i do my sampling
            { '$replaceRoot': { newRoot: '$first' } },
            {
              '$project': {
                _id: true,
                //... some other fields
              }
            }
          ] 

      When i do this i get approximately two times more documents than when i inverse the $replaceRoot and $sampleRate steps =>

        

        [
            {
              '$match': {
                publishedAt: {
                  '$gt': new Date('2021-04-27T22:00:00.000Z'),
                  '$lt': new Date('2022-04-28T21:59:59.999Z')
                },
                //... some other matching fields
              }
            },
            {
              '$group': {
                _id: {
                  keyWords: '$keyWords', // This is an Array<String>
                  //... some other fields
                },
                first: { '$first': '$$CURRENT' }
              }
            },
            { '$replaceRoot': { newRoot: '$first' } },
            { '$match': { '$sampleRate': 0.25 } }, // This is where i do my sampling
            {
              '$project': {
                _id: true,
                //... some other fields
              }
            }
          ]

      ... I don't understand why oO They should give the same number of documents to me.

      Do you know where i'm failing to understand ? Or is it a bug ?

      PS : I created a question here : https://stackoverflow.com/questions/72048023/mongodb-aggregate-pipeline-sampling-fail

            Assignee:
            alya.berciu@mongodb.com Alya Berciu
            Reporter:
            cjbjohan.maupetit@laposte.net Johan Maupetit
            Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

              Created:
              Updated:
              Resolved: