-
Type: Improvement
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Query Execution
-
QE 2023-08-21, QE 2023-09-04, QE 2023-09-18, QE 2023-10-02, QE 2023-10-16, QE 2023-10-30
in this group query we are grouping several fields with a common subpath
db.a.explain().aggregate({$match: {"magic.position": "air"}}, {$group: {_id: {a: "$foo.b.c", d: "$foo.e.f", g: "$foo.h.i"}}}).queryPlanner.winningPlan.slotBasedPlan.stages [2] mkbson s13 [_id = s12] true false [2] project [s12 = newObj("a", s9, "d", s10, "g", s11)] [2] group [s9, s10, s11] [] [2] project [s9 = traverseP(s4, lambda(l101.0) { traverseP(getField(move(l101.0), "b"), lambda(l102.0) { getField(move(l102.0), "c") }, 1) }, 1), s10 = traverseP(s4, lambda(l101.0) { traverseP(getField(move(l101.0), "e"), lambda(l102.0) { getField(move(l102.0), "f") }, 1) }, 1), s11 = traverseP(s4, lambda(l101.0) { traverseP(getField(move(l101.0), "h"), lambda(l102.0) { getField(move(l102.0), "i") }, 1) }, 1)] [1] filter {traverseF(s5, lambda(l1.0) { traverseF(getField(l1.0, "position"), lambda(l2.0) { ((l2.0 == s8) ?: false) }, false) }, false)} [1] scan s6 s7 none none none none lowPriority [s4 = foo, s5 = magic] @"5ddad002-6f31-4499-a881-8cb09547796e" true false
each field on the sub-object "foo" has their own traverse. Even worse if there was a common subpath after foo (eg "foo.bar.a", "foo.bar.b", "foo.bar.c") we traverse for the bar field multiple times, when it could just be done once. These n^2 lookups become more painful when documents are large and there are many paths to lookup.
One solution is to try to generate plans that look more like this projection query.
db.a.explain().aggregate({$project: {"foo.b.c": 1, "foo.e.f": 1, "foo.h.i": 1, _id: 0}}).queryPlanner.winningPlan.slotBasedPlan.stages [2] project [s6 = traverseP(s4, lambda(l1.0) { if isObject(l1.0) then makeBsonObj(MakeObjSpec(keep, [], ["foo"]), l1.0, traverseP(getField(l1.0, "foo"), lambda(l2.0) { if isObject(l2.0) then makeBsonObj(MakeObjSpec(keep, [], ["b", "e", "h"]), l2.0, traverseP(getField(l2.0, "b"), lambda(l3.0) { if isObject(l3.0) then makeBsonObj(MakeObjSpec(keep, ["c"], []), l3.0) else Nothing }, Nothing), traverseP(getField(l2.0, "e"), lambda(l4.0) { if isObject(l4.0) then makeBsonObj(MakeObjSpec(keep, ["f"], []), l4.0) else Nothing }, Nothing), traverseP(getField(l2.0, "h"), lambda(l5.0) { if isObject(l5.0) then makeBsonObj(MakeObjSpec(keep, ["i"], []), l5.0) else Nothing }, Nothing)) else Nothing }, Nothing)) else Nothing }, Nothing)] [1] scan s4 s5 none none none none lowPriority [] @"5ddad002-6f31-4499-a881-8cb09547796e" true false
this still however has a similar problem where each call to getField is done independently and can result in the full bson or subobject in the bson being scanned multiple times.
there is logic in the classic engine here to avoid scanning a document multiple times if possible while doing an inclusion projection https://github.com/mongodb/mongo/blob/2e0259b3050e4c27d47e353222395d21bb80b9e4/src/mongo/db/exec/fastpath_projection_node.h#L88
- is depended on by
-
SERVER-84021 Depricate $_internalShredDocuments
- Backlog
- is duplicated by
-
SERVER-83533 Improve efficiency of SBE when running projections with same prefix
- Closed
- is related to
-
SERVER-94416 Add scenario that accesses many subfields with same parent field
- Backlog
-
SERVER-79975 [Classic] Optimize large group keys
- Closed
- related to
-
SERVER-79975 [Classic] Optimize large group keys
- Closed