-
Type: Improvement
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: 2.4.11, 2.6.4
-
Component/s: Performance, Querying
-
None
-
Major Change
-
Query 16 (06/24/16)
-
(copied to CRM)
ISSUE SUMMARY
In versions of MongoDB 3.2 and earlier, the metadata associated with an index contains a single bit to indicate whether or not the index is multikey. There is no granular information to indicate which indexed fields cause the index to be multikey. This has implications for compound multikey indexes because it prevents the query planner from using the index as efficiently as theoretically possible (i.e. given perfect knowledge about the structure of the documents in the index).
Support for tracking which indexed fields cause the index to be multikey (aka "path-level multikey tracking") has been implemented in MongoDB 3.3.8. The query planner then uses this information to get tighter bounds on queries by assigning additional predicates to the index.
SUPPORTED STORAGE ENGINES
This feature is supported for the WiredTiger and InMemory storage engines only.
This feature is not supported for the MMAPv1 storage engine. MMAPv1 support for path-level multikey information is tracked in SERVER-22727.
IMPACT ON DOWNGRADE
It is possible to downgrade from MongoDB 3.3.8 to MongoDB 3.2 only without first dropping all of the indexes that were created on MongoDB 3.3.8. However, it is only possible downgrade to MongoDB 3.2.7 or newer. Downgrading to MongoDB 3.2.7 will automatically delete the path-level multikey information from the index metadata. The indexes will retain only the index-level flag that indicates whether the index is multikey.
Downgrading from MongoDB 3.3.8 to a version of MongoDB older than 3.2.7 may produce the following assertion message if indexes contain path-level multikey information:
2016-05-31T13:18:11.090-0400 I - [initandlisten] Assertion: 13111:wrong type for field (ns) 10 != 2
See SERVER-23117 for more information on this downgrade requirement.
TECHNICAL DETAILS
Tighter bounds in 3.3.8+
Using the path-level multikey information we track for an index, we can search a bounded interval (thus having intersected the bounds) on a field in the compound multikey index that doesn't ever refer to an array value contain multiple elements. In the following example, the "a" field is a scalar and the "b" field is an array value containing multiple elements.
Index: {a: 1, b: 1} Document: {a: 5, b: [1, 2, 3]} ⇒ Multikey paths: ["b"] Query: {a: {$gte: 0, $lt: 10}} ⇒ Bounds on "a": [0, 10) ⇒ Bounds on "b": [MinKey, MaxKey]
SERVER-22401 has additional examples of what bounds can be generated for compound multikey indexes given a document structure and query.
Contrasted with 3.2 and earlier
In MongoDB 3.2 and earlier, the query planner doesn't have enough information to know that the "a" field is never an array value in some document in the collection (e.g. that the document {a: [1, 2, 3], b: 5} doesn't also exist in the collection). Without this additional information, the query planner must assume that the {a: {$gte: 0}} condition could apply to a different element than the {a: {$lt: 10}} condition. Thus, we can only search a left-bounded or right-bounded interval on the "a" field and apply the other condition as a subsequent filter:
{ "stage" : "FETCH", "filter" : { "a" : { "$gte" : 0 } }, "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "a" : 1, "b" : 1 }, "indexName" : "a_1_b_1", "isMultiKey" : true, ... "indexBounds" : { "a" : [ "[-inf.0, 10.0)" ], "b" : [ "[MinKey, MaxKey]" ] } } }
Explain output in 3.3.8+
The explain output for the COUNT_SCAN, DISTINCT_SCAN, and IXSCAN stages has been updated to include an additional "multiKeyPath" field for indexes that support this feature. The "multiKeyPath" field describes what prefixes of the indexed fields cause the index to be multikey:
{ "stage" : "IXSCAN", "keyPattern" : { "a" : 1, "b" : 1 }, "indexName" : "a_1_b_1", "isMultiKey" : true, "multiKeyPaths" : { "a" : [ ], "b" : [ "b" ] }, ... }
SERVER-23115 has additional information regarding the "mulitKeyPaths" field in the explain output.
Updated description
Suppose you have documents of the form
{a: [{b: 1}, {b: 2}, {b: 3}]} {a: [{b: 4}, {b: 5}, {b: 6}]}
with the index {"a.b": 1}. With this schema, there is currently no way to constrain the index bounds with multiple predicates. Users who need to do this generally must change their schema. Instead we could provide new query language semantics to support this use case. See the comments below or related tickets SERVER-8610, SERVER-7959, and SERVER-6050 for more details.
Original Description
It looks like there is unexpected slow performance on a range query when using $elemMatch. I first encountered it when using the aggregation framework and using the $match pipeline with $elemMatch, and have now backed it out into an example with $find.
The performance hit is quite large, 20x slower or so, when I believe, if anything, the query should actually run FASTER, not 20x slower.
I could try and troubleshoot more, but I don't totally understand the explain() output when doing ranges of dates. Sometimes explain() puts the indexBounds as 'true' in the explain(), and sometimes explain() put the indexBounds as 'ISODate("0NaN-NaN-NaNTNaN:NaN:NaNZ")', and I"m not 100% sure how to interpret that, other than I believe the equivalent is "no bound" (meaning the query is not bounding itself at the top or bottom of the range properly).
- duplicates
-
SERVER-8610 $elemMatch (objects in array) not using index range correctly
- Closed
- is duplicated by
-
SERVER-7959 Potentially unexpected scans with compound indexes when some fields are multikey
- Closed
-
SERVER-10436 wrong index ranges when using compound index on a list
- Closed
-
SERVER-14618 Wrong index bounds when using "hint"
- Closed
-
SERVER-15059 index on date range on embedded collection is not used correctly
- Closed
-
SERVER-20302 Strange execution plan for range queries with $elemMatch
- Closed
-
SERVER-23038 Why index dosn't working?
- Closed
-
SERVER-23079 Why sub document cannot match index?
- Closed
-
SERVER-24786 Compound multikey index does not compound indexbounds when nested in an object
- Closed
-
SERVER-26978 Query Optimizer not using index correctly
- Closed
-
SERVER-27386 Poor use of index
- Closed
-
SERVER-28010 Creating index on array of sub document fields causes unexpected bounds from index use in explain
- Closed
-
SERVER-4468 Keep track of multi-key fields in index
- Closed
-
SERVER-6050 Consider allowing $elemMatch applied to non arrays
- Closed
-
SERVER-14944 Use of Indexes for Query
- Closed
- is related to
-
SERVER-21178 Super slow query and increased memory usage on inefficient range queries
- Closed
-
SERVER-22727 Extend the MMAP index catalog to store path-level multikeyness info
- Closed
-
SERVER-7566 simpler way to figure out if an index is multikey
- Backlog
-
SERVER-3173 Planner should use path-level multikey info to generate covered plans when possible
- Closed
- related to
-
SERVER-31444 Queries against multikey trailing fields of a compound 2d index are incorrectly covered, leading to incorrect results
- Closed
-
SERVER-17064 indexBounds error
- Closed
-
SERVER-18115 The planner can add an unnecessary in-memory sort stage for .min()/.max() queries
- Closed
-
SERVER-21963 Index range scan inaccurate when compound index has an array field
- Closed
-
SERVER-4463 Allow covered index use for non-array fields in multi-key index
- Closed
-
SERVER-4468 Keep track of multi-key fields in index
- Closed