-
Type: Improvement
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Query Optimization
Context:
Consider the following aggregation query:
[ { $addFields: { flattened: { $map: {input: '$outer', as: "iter", in : "$$iter.inner"}, }, }, }, { $match: {flattened: {$elemMatch: {$eq: true}}}, } ]
Currently, the pipeline optimisation rules would try to swap the $match stage before the $addFields one which would imply a rename from {{ "flattened": "outer.inner" }}. As such, the generated filter for the underlying find-land query would look like:
{filter: { "outer.inner": {$elemMatch: {$eq: true}}}}
Which is wrong.
The field outer is an array, and outer.inner would result in array traversal. Consequently {$elemMatch: {$eq: true}} would be matched against non-array elements, which differs from the original behaviour.
Proposed solution:
As asya.kamsky@mongodb.com proposed, it's possible to rename dotted $elemMatch expressions in a different way such that the semantics are preserved after the expression rewrite.
Instead of generating
"outer.inner":{$elemMatch:{$eq:true}}
the rewrite engine could output
"outer":{$elemMatch:{inner:true}}
As things currently stand, the renaming implementation doesn't have sufficient context to discern between user provided dots and $map generated ones. This could be addressed in a series of sub-tasks:
Step 1.0: Introduce a new type: expression::RenameMapping
- Currently, renames are represented as a StringMap<std::string>.
- This is not expressive enough. We need to know which dot was introduced by $map.
- For now we could alias it to absl::flat_hash_map<FieldRef, FieldRef>.
Step 2: Make FieldRef aware of which part was generated by $map.
Why: If generated by $map , outer.inner should then be split into [<elemMatch path>, <child prefix>].
- Both parts could have multiple dots, which weren't necessarily introduced by $map.
Step 3.0: Update the renaming algorithm -
Make wouldRenameSucceed generate multiple Renamables. [Source
- Currently wouldRenameSucceed can only generate one rewritten path. This won't suffice.
- If we encounter a $elemMatch with a $map generated dot:
-
- Split the rename into head (rename for $elemMatch) and tail (prefix for children)
- Recurse into the children with a new RenameMapping (see Step 1.1) which would prefix the children's paths with tail .
-
- This should be done via hasOnlyRenameableMatchExpressionChildren , which contrary to the name, also mutates the list of renamables.
- Otherwise keep the current behaviour.
Step 1.1: Extend expression::RenameMapping to support adding prefixes.
- This could simply be represented by a single FieldRef
- This type of rename mapping should:
-
- When faced with an non-existing path (boost::none) , set it to prefix
- When faced with an existing path, prefix it with prefix.
Step 3.1: Extend the renaming algorithm to support adding prefixes. * We can't really do that now if the path is boost::none. [Source
- related to
-
SERVER-90869 Disallow dotted full-path renames for '$elemMatch' expressions
- Closed