Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-62150

SBE Multiplanning can be slow when suboptimal plan runs first

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Critical - P2 Critical - P2
    • 8.0.0-rc0
    • Affects Version/s: 5.1.1, 5.2.0-rc1, 6.0.12, 7.0.4
    • Component/s: None
    • Query Execution
    • Fully Compatible
    • ALL
    • QE 2022-01-24, QO 2023-11-13, QO 2023-11-27

      Issue Status as of Apr 18, 2024

      ISSUE DESCRIPTION AND IMPACT

      Queries that use the Slot Based Execution (SBE) engine can experience long query optimization times (this ticket) and/or have non-optimal plans selected (SERVER-83196).

      This happens because SBE is designed for fast execution of the winning plan, instead of efficient round-robin execution of candidate plans during planning. Ultimately, SBE planning time can be proportional to the longest plan, instead of the shortest plan.

      Because of this inefficiency, the SBE planner also has less time available to gather information about the candidate plans, which can lead to a worse decision when the planning period ends.

      More technical details can be found in the this README.

      This issue has been fixed in MongoDB 8.0.0, which avoids using the SBE planner for multiplanning: instead the server always uses the Classic Engine for multiplanning, even when SBE is used to execute the winning plan.

      DIAGNOSIS AND AFFECTED VERSIONS

      The issue is present in MongoDB 6.0, 7.0 and 7.3. It is fixed in MongoDB 8.0.0.

      A query affected by this bug will use SBE and will spend a lot of time planning. Both symptoms are visible in "Slow query" log lines:

      • "queryFramework":"sbe" means the query is using SBE.
      • "planningTimeMicros" shows how much time was spent planning the query.

      When looking at an explain plan, the presence of a "slotBasedPlan" field means the query uses SBE.

      WORKAROUNDS

      As an immediate workaround on MongoDB versions affected by the bug, users can disable SBE by setting the internalQueryFrameworkControl parameter to “forceClassicEngine”. Since SBE often outperforms the Classic Engine, this option may affect the performance of queries which formerly executed in SBE.

      Another workaround is to hint the affected queries, using the hint() method, or using an index filter. An index filter allows you to hint a specific query without changing the application, but it only exists for the duration of the server process and does not persist after shutdown. Note that we plan to deprecate index filters starting in version 8.0, in favor of Persistent Query Settings (SERVER-17625).


      Original description:

      Currently, the strategy used in SBE multiplanning is as follows:

      • We run non blocking plans before blocking ones.
      • We run each plan’s trial period to completion (i.e. until we return 101 documents or we use up the plans budget). We use the number of reads performed by said plan to bound the number of reads used by any remaining plans.

      The problem with this approach is that if the first plan we run is not the optimal one, we are stuck running it and we can potentially use all of the reads. As an example, consider two plans, A and B. Plan A needs to perform 10k storage engine reads to get 101 documents, while plan B needs to perform 101 reads to get 101 documents. If Plan B runs first, we have no problems: we will set the reads limit for plan A to 101, and it will stop running after 101 reads. If Plan A runs first however, we will be stuck running plan A for all 10k reads.  Though we’ll eventually run plan B and it will be chosen, this negatively impacts the performance of queries which need to use the multiplanner.

            Assignee:
            ivan.fefer@mongodb.com Ivan Fefer
            Reporter:
            mihai.andrei@mongodb.com Mihai Andrei
            Votes:
            3 Vote for this issue
            Watchers:
            65 Start watching this issue

              Created:
              Updated:
              Resolved: