Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-78752

An incorrect plan can be written to the SBE plan cache for certain $or queries, leads to incorrect query results

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Blocker - P1 Blocker - P1
    • 7.1.0-rc0, 7.0.0-rc8
    • Affects Version/s: 6.3.1, 7.0.0-rc7
    • Component/s: None
    • Query Execution
    • Fully Compatible
    • ALL
    • QE 2023-07-24
    • 167

      After analysis the following root cause is two fold:

      1) The Parameterization of a query abstracts dependencies between two $or branches in such a way that dependencies are lost. A query containing an $or statement with predicates on the same field of the same value are considered to be the same as a query where the 2 filters did not have the same value. Both queries would result in the same QueryShape which is semantically incorrect as the dependency are used in query optimizations. These two queries should have two different planhashes

      2) While checking for equality of 2 $or branches containing and IndexScan the equality function was missing to check the the IETs and 2 falsely considered equal IndexScans were merged together leading to a correctness issue. 

      Hi Team,

      I have identified an issue with the Slot-based Execution Engine where the Query Planner is selecting the wrong plan for a find() command (see details below), resulting in data inconsistency. In summary, it's implicitly adding a limit(1) to a query, causing the query to always return 1 document instead of the actual number of matched documents (2 expected). Before we found the actual culprit, one workaround was to do a test failover in Atlas also which would resolve the issue temporarily. 

      Scenario:

      • We created the corresponding aggregation pipeline for the same find() query, and it returned 2 documents as expected.
      • If we switch the position of any field in the query filter, it returns 2 documents as expected.
      • We looked at the PlanCache of the collection and found two queries identical to the problematic one, except that one of them has "limit(1)" which is the one the query planner kept selecting although we had not explicitly used limit in the query. - the culprit for this issue.
      • Because of this cached plan, the number of returned documents was always 1.
      • We ran the PlanCache.clearPlansByQuery() command to remove the query, and that solved the issue.

      I have also attached the Plan cache output which contains the bad plans along with the new plans (after we cleared the cache). Each section is labeled "old cache query plans" and "new cached query plans" respectively.

      Please let me know if you need any additional information. Thank you for your help.

      Best regards,
      Reginald

        1. old-and-new-query-plans.txt
          10 kB
          Reginald Chounoune
        2. summary_oldMethod
          31 kB
          Peter Volk
        3. summary_withNewMethod
          29 kB
          Peter Volk

            Assignee:
            david.storch@mongodb.com David Storch
            Reporter:
            reginald.chounoune@mongodb.com Reginald Chounoune
            Votes:
            0 Vote for this issue
            Watchers:
            35 Start watching this issue

              Created:
              Updated:
              Resolved: