-
Type: Improvement
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Storage Execution
Consider a set of updates in a single update command.
Each update is a pair: [\{q1, u1\}, \{q2, u2\}, ... \{qN, uN\}] - apply update ui to all documents that match qi.
Right now we are doing it in sequence: fetch q1, apply u1, fetch q2, apply u2 and so on.
What if we do the following:
- Create big query Q = {$or: [q1, q2, .. qN]}.
- Start by marking each update as not executed.
- For each document D, returned by Q:
- For each non executed update {qi, ui}, if document matches qi, apply update ui and mark it as executed.
- Proceed with updated document. So if a document D initially matches q1 and after applying u1 it starts to match q2, we will also apply u2 to it as well.
- After Q is exhausted, for each not executed update with upsert: true, insert new document.
This might save us from an overhead of running N separate queries (and possibly multi-planning each separately), but will also force us to apply all N predicates to every possible document, leading to O(N^2) complexity.
But for small N <= 10 it might be beneficial.
It can also improve performance of write operations like $merge.
- is related to
-
SERVER-91191 Re-use CollectionAcquisition in write_ops_exec::performUpdates
- Backlog