Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-8141

Avoid treating arrays as literals in the aggregation pipeline

    • Type: Icon: New Feature New Feature
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 3.1.6
    • Affects Version/s: None
    • Component/s: Aggregation Framework
    • None
    • Major Change
    • Quint Iteration 4, Quint Iteration 5, Quint Iteration 6

      In the aggregation framework, arrays are interpreted as literals, which is not the intuitive behavior. Take the following code as an example:

      > db.foo.drop()
      true
      > db.foo.insert({_id: 0, a: ['foo', 'bar', 'baz'], b: 'bar', c: 'Baz'})
      WriteResult({ "nInserted" : 1 })
      > db.foo.aggregate([{$project: {intersection: {$setIntersection: ['$a', ['$b', {$toLower: '$c'}]]}}}])
      { "_id" : 0, "intersection" : [ ] }  // Intuitively would expect [ 'bar', 'baz' ]
      > db.foo.insert({_id: 1, a: ['foo', '$b'], b: 'bar', c: 'Baz'})
      WriteResult({ "nInserted" : 1 })
      > db.foo.aggregate([{$project: {intersection: {$setIntersection: ['$a', ['$b', {$toLower: '$c'}]]}}}])
      { "_id" : 0, "intersection" : [ ] }
      { "_id" : 1, "intersection" : [ "$b" ] }  // Instead, it is matching against literals '$b' and {$toLower: '$c'}
      

      Instead of evaluating '$b' to be the value of the field 'b' in the current document, as is done with '$a', '$b' is treated as a literal when it is parsed inside of an array. Because arrays are treated as literals, it also prevents a projection stage from creating arrays of fields:

      > db.bar.drop()
      true
      > db.bar.insert({_id: 0, point: {x: 10, y: 20}})
      WriteResult({ "nInserted" : 1 })
      > db.bar.aggregate([{$project: {coords: ['$point.x', '$point.y']}}])
      assert: command failed: {
      	"errmsg" : "exception: disallowed field type Array in object expression (at 'coords')",
      	"code" : 15992,
      	"ok" : 0
      } : aggregate failed
      

      The aggregation framework should parse arrays in the same ways it parses expressions elsewhere in the aggregation framework. This would yield the following results from above:

      // With documents from collection 'foo' from above.
      > db.foo.aggregate([{$project: {intersection: {$setIntersection: ['$a', ['$b', {$toLower: '$c'}]]}}}])
      { "_id" : 0, "intersection" : [ "baz", "bar" ] }
      { "_id" : 1, "intersection" : [ ] }
      
      // With document from collection 'bar' from above.
      > db.bar.aggregate([{$project: {coords: ['$point.x', '$point.y']}}])
      { "_id" : 0, "coords" : [ 10, 20 ] }
      

      Note this is a backwards breaking change.
      =======================================
      Original description:
      Builds an array.

      input:
       {a:1, b:2, c:3}
      
      operation:
       {$project: {array: {$array: ['$a', '$c', {$add:['$b', '$c'] }] } } }
      
      output:
       {array: [1, 3, 5]}
      

      An issue is how to handle missing fields (eg {$push:['$d']}). When building an object we would omit that field from the output. This would be expected if the array is a list or set, but could be a problem if indexed access is important. Possible solutions:

      1. Omit the field, shrinking the array.
      2. Replace the field with null, keeping the array the same size, but creating a value and differing from object behavior.
      3. Error out.
      4. Separate operators like $array when indexes are important and $list for when they aren't.

            Assignee:
            charlie.swanson@mongodb.com Charlie Swanson
            Reporter:
            mathias@mongodb.com Mathias Stearn
            Votes:
            14 Vote for this issue
            Watchers:
            13 Start watching this issue

              Created:
              Updated:
              Resolved: