Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-75267

Summing $bsonSize in $group stage with projected fields fails due to an invalid pushdown of $project

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 5.2.0, 6.0.5
    • Component/s: None
    • None
    • Query Optimization
    • ALL
    • Hide

      Use any collection with at least 1 non-ID field. For example, in mongosh

       

      db.test.insertOne({foo: "bar"}) 

      First see that the object size with only _id projected field is 22:

       

       

      test> pipeline = [
        { '$project': { _id: 1 } },
        { '$project': { size_bytes: { '$bsonSize': '$$ROOT' } } }
      ]
      test> db.test.aggregate(pipeline)
      [ { _id: ObjectId("641dc5f51b47ae483a7f5235"), size_bytes: 22 } ]

       

      But the whole document including field "foo" is 35:

      test> pipeline = [
        { '$project': { size_bytes: { '$bsonSize': '$$ROOT' } } }
      ]
      test> db.test.aggregate(pipeline)
      [ { _id: ObjectId("641dc5f51b47ae483a7f5235"), size_bytes: 35 } ] 

      Now sum up the size of all documents (there's only 1) which is 35 but should be 22:

      test> db.version()
      6.0.5
      test> pipeline = [
        { '$project': { _id: 1 } },
        {
          '$group': { _id: null, size_bytes: { '$sum': { '$bsonSize': '$$ROOT' } } }
        }
      ]
      test> db.test.aggregate(pipeline)
      [ { _id: null, size_bytes: 35 } ]

      In version 5.0 it reports size 22:

      test> db.version()
      5.0.15
      test> db.test.aggregate(pipeline)
      [ { _id: null, size_bytes: 22 } ] 

      Using $unset stage to exclude a field does work as expected:

      test> pipeline = [
        { '$unset': 'foo' },
        {
          '$group': { _id: null, size_bytes: { '$sum': { '$bsonSize': '$$ROOT' } } }
        }
      ]
      test> db.test.aggregate(pipeline)
      [ { _id: null, size_bytes: 22 } ]

      Potential workaround of 2 stages to do the sum rather than 1:

      [
        { '$project': { _id: 1 } },
        { '$project': { size: { '$bsonSize': '$$ROOT' } } },
        { '$group': { _id: null, size_bytes: { '$sum': '$size' } } }
      ] 
      Show
      Use any collection with at least 1 non-ID field. For example, in mongosh   db.test.insertOne({foo: "bar" }) First see that the object size with only _id projected field is 22:     test> pipeline = [   { '$project' : { _id: 1 } },   { '$project' : { size_bytes: { '$bsonSize' : '$$ROOT' } } } ] test> db.test.aggregate(pipeline) [ { _id: ObjectId( "641dc5f51b47ae483a7f5235" ), size_bytes: 22 } ]   But the whole document including field "foo" is 35: test> pipeline = [   { '$project' : { size_bytes: { '$bsonSize' : '$$ROOT' } } } ] test> db.test.aggregate(pipeline) [ { _id: ObjectId( "641dc5f51b47ae483a7f5235" ), size_bytes: 35 } ] Now sum up the size of all documents (there's only 1) which is 35 but should be 22: test> db.version() 6.0.5 test> pipeline = [   { '$project' : { _id: 1 } },   {     '$group' : { _id: null , size_bytes: { '$sum' : { '$bsonSize' : '$$ROOT' } } }   } ] test> db.test.aggregate(pipeline) [ { _id: null , size_bytes: 35 } ] In version 5.0 it reports size 22: test> db.version() 5.0.15 test> db.test.aggregate(pipeline) [ { _id: null , size_bytes: 22 } ] Using $unset stage to exclude a field does work as expected: test> pipeline = [   { '$unset' : 'foo' },   {     '$group' : { _id: null , size_bytes: { '$sum' : { '$bsonSize' : '$$ROOT' } } }   } ] test> db.test.aggregate(pipeline) [ { _id: null , size_bytes: 22 } ] Potential workaround of 2 stages to do the sum rather than 1: [   { '$project' : { _id: 1 } },   { '$project' : { size: { '$bsonSize' : '$$ROOT' } } },   { '$group' : { _id: null , size_bytes: { '$sum' : '$size' } } } ]

      Grouping sum of $bsonSize gets the sum of full documents not just the projected fields.

      In MongoDB version 5.0, using the sum of document sizes with a group stage as recommended on this page works when fields are projected (see steps below).

      However in version 6.0, it merely returns the sum of the whole document size, disregarding any field projections. I believe this may have to do with the change in release 5.2 to using Slot Based Execution query engine to execute $group stages. In 5.0 the regular query execution engine is used vs. 6.0 the SBE is used.

      What is interesting is that if you use a $unset stage to exclude fields then the size is calculated correctly. So a potential workaround is to exclude fields rather than include fields.

      Another potential workaround is to use 2 stages (project document size then sum that size) rather than the 1 group stage - example shown below.

            Assignee:
            peter.volk@mongodb.com Peter Volk
            Reporter:
            stuart@voxel51.com Stuart Wheaton
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: