-
Type: Bug
-
Resolution: Fixed
-
Priority: Critical - P2
-
Affects Version/s: 7.1.0, 7.0.4, 7.3.0-rc0, 7.2.0
-
Component/s: None
-
Query Optimization
-
Fully Compatible
-
ALL
-
v7.2, v7.0
-
-
QE 2023-12-25, QO 2024-01-08, QO 2024-01-22
-
162
Observable Problem
When running a query/aggregation with a $or, the SBE plan cache entry may store an invalid plan. This results in subsequent executions of the query, with different constants, returning incorrect results. In particular, the cached plan may incorrectly filter out documents which should be returned. This problem is specific to SBE and the SBE plan cache. The classic engine is not affected.
Under 7.0.4, 7.1,7.2, this bug affects find() commands, since SBE is on by default.
In the v7.0 branch and master branch, this bug affects only aggregations, since SBE use is restricted to a smaller set of queries (see SERVER-83685).
Reproduction with resmoke
To reproduce the bug, see SERVER-84338-repro-2.js and run it with resmoke:
python3 buildscripts/resmoke.py run --suites=no_passthrough_with_mongod SERVER-84338-repro-2.js
The script will run several tests against mongod, reconfiguring its internalQueryFrameworkControl value to trySbeEngine, trySbeRestricted and forceClassicEngine. At the end, it will print out a list of queries and configurations that produced incorrect results.
Under default configuration, which is trySbeRestricted, both 7.0 and master can produce incorrect results for an aggregation of this form:
[ { "$match" : { "$or" : [ { "_id" : 2, "foo" : 2 }, { "_id" : 2, "foo" : 999 } ] } }, { "$count" : "count" } ]
Explanation of the bug
The bug has to do with the interaction between top-level $ors, the $or -> $in rewrite, and the plan cache.
A query with the filter of the form {$or: {_id: 1, foo: 1}, {_id: 1, foo:999}} comes in, and is optimized/normalized, and parameterized as usual. Optimizing this expression does not change the expression (the _id predicate is not pulled up).
Each branch of the OR is then planned separately, using an index scan on _id and a residual filter on foo.
The planner then realizes that the scans generated beneath the OR are identical, and collapses them into one in QueryPlannerAccess::collapseEquivalentScans().
When collapsing the scans, we combine the residual filters on foo into
{ $or: [ { foo: { $eq: 999.0 } }, { foo: { $eq: 1.0 } } ] }
We then optimize this expression here which converts the $or to a $in here. The optimized residual predicate is:
{foo: {$in: [1, 999]}}
The resulting $in does not preserve the parameter IDs assigned to each predicate on foo. This is the crux of the problem: that parameterization information has effectively been "lost."
At this point, the $in right hand side is effectively "baked" into the query plan which is cached. Subsequent executions will use 1, 999 in the residual predicate even if those are not the values provided.
The attached SERVER-84338-repro-2.js contains more details, as well as some information about queries which are similar but do not exhibit the bug.
Next Steps
The remaining tasks here are:
(a) Determine the expected/desired interaction between match expression optimization and parameterization. This was probably discussed during the plan cache development. The repro script includes a similar situation where the bug does not reproduce, but it is not clear whether this is intentional, or whether it is just by chance.
(b) Fix the bug.
(c) Determine a testing strategy so we can catch this type of bug on our own.
(d) Investigate whether there are other cases where we can store an invalid plan in the cache. Any time we optimize a match expression after parameterization, we are at risk of this bug occurring, since the optimization can effectively "lose" parameter markings. I did a quick check of the planning code and couldn't find other places where we call MatchExpression::optimize() besides the on linked above when collapsing scans beneath an OR.
Original Description Provided by Customer
Given a query, `findOne({$or: [query, query]})` does not yield the same results as `findOne(query)`.
It works fine in 6.0
Description of Attached Files
index.mjs: Original reproduction provided by customer
SERVER-84338-repro.js: Reproduction script from Kyle Suarez, which runs the server under several configurations.
SERVER-84338-repro-2.js: Final repro script from Ian B, which supersedes the prior two. This file includes detailed notes on the bug, along with some different queries which do not exhibit the bug. This file, along with the resmoke command in the description, should be all that is needed to reproduce and understand the bug.
- related to
-
SERVER-84728 Allow MatchExpression $or -> $in rewrite to preserve parameters
- Backlog