-
Type: Improvement
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Query Planning
-
None
-
Fully Compatible
-
v7.2, v7.0, v6.0, v5.0
-
QO 2024-02-05
-
(copied to CRM)
-
135
As part of MatchExpression::optimize(), we have logic to try to rewrite an $or of equalities over the same path to an $in. This is advantageous, because it helps the downstream optimization code produce better plans. Here's an example of the rewrite:
// Original match expression. {$or: [{name: "Don"}, {name: "Alice"}]} // This gets rewritten to the following. {name: {$in: ["Alice", "Don"]}}
When not all of the clauses of the $or can get rewritten in this manner, the current implementation can output a match expression tree with an $or that is a direct child of another $or. Here's an example:
// Original match expression. {$or: [{name: "Don"}, {name: "Alice"}, {age: 42}, {job: "Software Engineer"}]} // This gets rewritten to the following. {$or: [{name: {$in: ["Alice", "Don"]}}, {$or: [{age: 42}, {job: "Software Engineer"}]}]}
As you can see, one of the direct children of the outer $or is another $or. There is a separate rewrite which happens as part of MatchExpression::optimize() which attempts to flatten such nested $or nodes. However, the $or -> $in rewrite happens afterwords. In the master branch, the $or-$or is subsequently simplified by the new boolean simplification module enabled in SERVER-81630, but in older branches the $or-$or is never simplified.
I would argue that despite boolean simplification, we should modify the implementation of the $or -> $in rewrite to avoid constructing directly nested $or nodes. Continuing the example above, the output of the $or rewrite should be as follows:
{$or: [{name: {$in: ["Alice", "Don"]}}, {age: 42}, {job: "Software Engineer"}]}
- is related to
-
SERVER-83091 $or query can trigger an infinite loop during plan enumeration
- Closed
- related to
-
SERVER-84013 Incorrect results for index scan plan on query with duplicate predicates in nested $or
- Closed