Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-102698

Moving $match before $group with compound _id is incorrect when predicate distinguishes equal values

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Query Optimization
    • ALL
    • Hide

      in classic, {$group: {_id:

      {a: "$a", b: "$b"}

      }} will swap with {$match: {"_id.a":

      {type: [1]}

      , "id.b": {type: [1]} }} which is incorrect. 

      Show
      in classic, {$group: {_id: {a: "$a", b: "$b"} }} will swap with {$match: {"_id.a": {type: [1]} , "id.b": {type: [1] } }} which is incorrect. 
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Background context: We try to swap $match before $group in cases where the match predicates are only on the _id field (and, more specifically, when the id field is just the same as one or more fields coming from the stage before it as in the case of {$group: {_id: "$a"}}). We also do this when some field that is computed from the same field as the _id field.
      Care is needed because group semantic can changes values. For instance, existence predicates on single fields groups cannot be pushed ahead of the $group with id=fieldBlah because missing fieldBlah will be materialized as null after the group which has a different meaning for existence predicates than missing. This ticket is related to SERVER-91102 which addressed a similar problem with type predicates.

      The problem: This is basically a rehash of SERVER-91102. The approach in that PR was mostly correct, but ignored an early return path which allows for a way to still produce the issue. See: https://github.com/mongodb/mongo/commit/57ea94e671579eb25ed7a1c9b79435488e908124#diff-c82608f595683eaa864d90cd62fde2fb28de367c31d7af078a03cdc4f71cb491

      The early return path shown in the below block: 

      if (thisGroup.getIdFields().size() != 1) {
        return true;
      } 

      which is correct (in my understanding) in the case of $exists where a compound ID field (e.g. `{$group: {_id:

      {a: "$a", b: "$b"}

      }}`) on the group would lead to the missings not being materialized as nulls obviating the correctness issue of swapping an existence predicate behind the group (I'm assuming I haven't verified this in classic). But in the case of $type these sub-id values will still be compared with type-insensitive equality presenting a correctness issues. 

      Note to Implementer: We should probably update the comments around there to explain the early termination path when reimplementing. 

       

            Assignee:
            samuel.mercier@mongodb.com Sam Mercier
            Reporter:
            samuel.mercier@mongodb.com Sam Mercier
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              None
              None
              None
              None