Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-79637

Incorrect query results in $lookup with TS foreign collection using a correlated predicate

    • Query Integration
    • Fully Compatible
    • ALL
    • v7.0, v6.0, v5.0
    • Hide
      > db.local.insertMany([{_id: 0, key: 1},{_id: 1, key: 2}])
      
      > db.createCollection("foreign", { timeseries: { timeField: "time", metaField: "meta", granularity: "minutes" }})
      > db.foreign.insertMany([{time: new Date(), _id: 0, meta: 1, val1: 42, val2: 100},{time: new Date(), _id: 1, meta: 2, val1: 17, val2: 100}])
      
      db.local.aggregate(
        {$lookup: {
          from: "foreign",
          let: {lkey: "$key"},
          pipeline: [
            {$match: {$expr: {$lt: ["$val1","$val2"]}}},
            {$match: {$expr: {$eq: ["$meta","$$lkey"]}}},
            {$project: {lkey: "$$lkey",fkey: "$meta",val: "$val1",_id: 0,}},
          ],
          as: "joined"
        }}
      )
      The result is:
      { "_id" : 0, "key" : 1, "joined" : [ { "lkey" : 1, "fkey" : 1, "val" : 42 } ] }
      { "_id" : 1, "key" : 2, "joined" : [ { "lkey" : 1, "fkey" : 1, "val" : 42 } ] }
      

      The match on the local record with key=2 is wrong.

      Show
      > db.local.insertMany([{_id: 0, key: 1},{_id: 1, key: 2}]) > db.createCollection( "foreign" , { timeseries: { timeField: "time" , metaField: "meta" , granularity: "minutes" }}) > db.foreign.insertMany([{time: new Date(), _id: 0, meta: 1, val1: 42, val2: 100},{time: new Date(), _id: 1, meta: 2, val1: 17, val2: 100}]) db.local.aggregate( {$lookup: { from: "foreign" , let: {lkey: "$key" }, pipeline: [ {$match: {$expr: {$lt: [ "$val1" , "$val2" ]}}}, {$match: {$expr: {$eq: [ "$meta" , "$$lkey" ]}}}, {$project: {lkey: "$$lkey" ,fkey: "$meta" ,val: "$val1" ,_id: 0,}}, ], as: "joined" }} ) The result is: { "_id" : 0, "key" : 1, "joined" : [ { "lkey" : 1, "fkey" : 1, "val" : 42 } ] } { "_id" : 1, "key" : 2, "joined" : [ { "lkey" : 1, "fkey" : 1, "val" : 42 } ] } The match on the local record with key=2 is wrong.
    • QI 2023-10-16, QI 2023-10-30, QI 2023-11-13, QI 2023-11-27, QI 2023-12-11, QI 2023-12-25, QI 2024-01-08, QI 2024-01-22
    • 200

      The bug is a result of two different optimizations not being aware of each other. Basically, this is what's going on when there are multiple $match stages and the correlated one isn't the first:
      1. TS-specific optimization of predicates replaces the correlated match with a filter on a collection access stage.
      2. Lookup-specific optimization that is supposed to cache results from an uncorrelated collection access fails to notice that it cannot be applied here and caches the results of the collection access with the first seen key, which are then reused with the following keys.

      Combining two $match stages into one makes it sufficiently complex to disable optimization #1.
      Swapping the two $match stages allows optimization #2 to realize the cache should be abandoned.

            Assignee:
            naama.bareket@mongodb.com Naama Bareket
            Reporter:
            irina.yatsenko@mongodb.com Irina Yatsenko (Inactive)
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: