Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-91192

Investigate performance improvement of BatchedUpdate stage

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Storage Execution

      Consider a set of updates in a single update command.

      Each update is a pair: [\{q1, u1\}, \{q2, u2\}, ... \{qN, uN\}] - apply update ui to all documents that match qi.

      Right now we are doing it in sequence: fetch q1, apply u1, fetch q2, apply u2 and so on.

      What if we do the following:

      1. Create big query Q = {$or: [q1, q2, .. qN]}.
      2. Start by marking each update as not executed.
      3. For each document D, returned by Q:
        1. For each non executed update {qi, ui}, if document matches qi, apply update ui and mark it as executed.
        2. Proceed with updated document. So if a document D initially matches q1 and after applying u1 it starts to match q2, we will also apply u2 to it as well.
      4. After Q is exhausted, for each not executed update with upsert: true, insert new document.

      This might save us from an overhead of running N separate queries (and possibly multi-planning each separately), but will also force us to apply all N predicates to every possible document, leading to O(N^2) complexity.

      But for small N <= 10 it might be beneficial.
      It can also improve performance of write operations like $merge.

            Assignee:
            Unassigned Unassigned
            Reporter:
            ivan.fefer@mongodb.com Ivan Fefer
            Votes:
            0 Vote for this issue
            Watchers:
            15 Start watching this issue

              Created:
              Updated: