Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-40635

Introduce $lookupOne as an optimzable aggregation step

    • Type: Icon: Improvement Improvement
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Aggregation Framework
    • None

      Given 2 big collections: vouchers and voucher_transactions (referencing the vouchers via their voucher_id), and let's say, you are operating on the voucher_transactions and want the following:

      1. For every voucher_transaction, lookup the voucher that is references via the voucher_id:
        {"$lookup":{"as":"__joined_vouchers","foreignField":"_id","from":"vouchers","localField":"voucher_id"}}
      2. Then you usually do not want to have the array so you utilize 
        {"$unwind":{"path":"$__joined_vouchers","preserveNullAndEmptyArrays":true,}}
      3. And finally, you want to to sort and limit them.
        {"$sort":{"booked_at":-1,}}
        {"$limit":NumberLong('10')}

      There is one thing that prevents the optimizer in my eyes from moving the $limit and $sort before the $lookup: He can not know if the number of docuents after the $lookup and $unwind is the same like before, since $lookup creates and array. 

      But actually, there is a special case that happens very often: when the foreign key 'voucher_transactions.voucher_id' is really only pointing to exactly one 'voucher', since the 'voucher._id' is uniq. 

      In that case, it would make totally sense (in my eyes) to move the $sort and $limit steps befor the $lookup stages and increse the performance of the aggregation a lot. 

      So my suggetion is to introduce sth. like a $lookupOne aggregation function, that works like $lookup but attaches only one looked up document. 

      If you have control over the query, you might reorder the pipeline for yourself. But if you use the bi connector (/mongosqld), a JOIN operations is always translated to $lookup and $unwind steps and afterwards, the ORDERs and LIMITs  are applied as $sort and $limit stages. With the BI connector, there is no way to take influence in the order of the aggregation steps, afaik. However, with a 1:n relationship like that, the BI connector would be able to decide to use a potentially existing $lookupOne stage. 

            Assignee:
            eric.sedor@mongodb.com Eric Sedor
            Reporter:
            roman.lehnert@givve.com Roman Lehnert
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: