Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-89468

Measure performance overhead of the SBE stage builders for queries from multi-planning benchmark

    • Type: Icon: Task Task
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 8.1.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Optimization
    • Fully Compatible
    • v8.0
    • QO 2024-04-15, QO 2024-04-29

      The analysis done by david.percy@mongodb.com of the results from the Simple.yml and VariedSelectivity.yml find that the "mixed" multi-planning configuration is slower than pure classic by somewhere in the ballpark of 1.5 milliseconds. The proposed explanation for this is that it is due to the extra cost of the SBE stage builders. We did some work trying to determine whether this overhead is indeed explained by the SBE stage builders in PERF-5248, but I'm not sure it was ever conclusive.

      The stage builders taking 1.5 milliseconds for a relatively simple query (a conjunction with 64 leaf predicates) sounds somewhat high to me, even though we know that SBE stage building is a fairly heavy process. I think it's worth doing some additional investigation to determine whether the overhead of "mixed" vs. pure classic can indeed be explained by the stage builders.

      For this purpose, I suggest that we try to measure the latency of the stage building phase for the Simple.yml and/or VariedSelectivity.yml queries. We've brainstormed two ideas for how to achieve this:

      • Implement a google microbenchmark. If we choose this approach, we should probably move this ticket to the SERVER project since it will involve a commit to the 10gen/mongo repo.
      • Do a one-off experiment where you instrument the code to read the timestamp counter in order to determine the number of CPU cycles/instructions taken on average by the SBE stage builders. You could check out the BenchmarkProfiler here for an idea of how to achieve this. My understanding is that given the number of cycles taken stage building per query, you might not necessarily be able to convert this super accurately into wall clock time but you could presumably get a good guestimate using the CPU frequency.

            Assignee:
            daniel.segel@mongodb.com Daniel Segel
            Reporter:
            david.storch@mongodb.com David Storch
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: