Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-69119

For queries using SBE, auto-parameterize predicates written using $expr

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Execution
    • 10

      Consider the following two similar-looking queries:

      db.coll.find({a: {$eq: "constant"}})
      
      db.coll.find({$expr: {$eq: ["$a", "constant"]}})
      

      The queries are quite similar in meaning (though not identical) and both can use an index on {a: 1}. Therefore, you might expect that in both cases the "constant" gets auto-parameterized. However, this is not the case in the current implementation of auto-parameterization for the SBE plan cache. The constant in the first query will get auto-parameterized, but the constant beneath the $expr will not. In fact, we never auto-parameterize anything inside a $expr at the moment.

      One place where this can come up is if the $lookup join predicate is expressed using $expr. As an example, consider this $lookup used in TPC-H query 18:

      db.getSiblingDB('tpch').orders.aggregate([
          {
              "$lookup": {
                  "from": "lineitem",
                  "let": {"o_orderkey": "$o_orderkey"},
                  "as": "lineitem",
                  "pipeline": [
                      {"$match": {"$expr": {"$eq": ["$$o_orderkey", "$l_orderkey"]}}},
                      {"$group": {"_id": "$l_orderkey", "sum(l_quantity)": {"$sum": "$l_quantity"}}},
                      {"$match": {"$expr": {"$gt": ["$sum(l_quantity)", 300]}}},
                      {"$project": {"_id": 0, "o_orderkey": "$_id", "sum(l_quantity)": 1}}
                  ]
              }
          },
          ...
      ]);
      

      For every document from the scan of the orders collection, a query is internally composed against the lineitems collection. This query will include the equality predicate "o_orderkey == l_orderkey", expressed using $expr. Each such query will have a different constant substituted for "o_orderkey", and therefore without auto-parameterization of $expr will result in a different plan cache key. Note that this behavior will go away once we implement SERVER-69103, which will prevent SBE from being used on the inner side of a DocumentSourceLookup.

      In order to constrain the scope of this improvement, I imagine it we would only implement it for simple equalities and inequalities expressed using $expr. I believe it will require work to make sure that the plans generated by sbe_stage_builder_expression.cpp refer to runtime environment slots that can be rebound rather than inlining constants into the plan.

            Assignee:
            backlog-query-execution [DO NOT USE] Backlog - Query Execution
            Reporter:
            david.storch@mongodb.com David Storch
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated: