-
Type: Improvement
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Index Maintenance
-
None
Related to this discussion (long, look at the end for most of the substance):
https://groups.google.com/d/topic/mongodb-user/mr1uz9PuEek/discussion
Currently, MongoDB uses the first element in the $all operator to find the index and return the documents that are then scanned for the matches for the rest of the match criteria.
Here is an example of the documents in the collection and a query, as well as the explain info:
> r.ensureIndex(
{records:1})
> r.find({records:{ $all:[
,
{status:200} ] }})
{ "_id" : ObjectId("4d6574f485904ac21ab43fca"), "bytes" : 10, "id" : 1, "records" : [
,
{ "url" : "yahoo.com" },
{ "custid" : 456 } ] }
{ "_id" : ObjectId("4d6574fa85904ac21ab43fcb"), "bytes" : 10, "id" : 1, "records" : [
,
{ "url" : "cnn.com" },
{ "custid" : 456 }] }
> r.find({records:{ $all:[
{custid:456},
{status:200} ] }}).explain()
{
"cursor" : "BtreeCursor records_1",
"nscanned" : 4,
"nscannedObjects" : 4,
"n" : 2,
"millis" : 0,
"indexBounds" : {
"records" : [
[
,
{ "custid" : 456 } ]
]
}
}
According to how it was explained, the index used in this case (and similar queries) is always that of the first element of the query match array ($all), in this case on "custid" field, and then the documents returned from that index are scanned for matching of the rest of the query match terms (status:200 in this case). Cardinality of the indexes of the indexes is not taken into account when selecting which index to apply for the query - it is always just the index of the first field specified in $all.
A couple suggestions
- evaluate all the query fields and select the most effective index (e.g. based on cardinality) instead of always the 1st element index.
- use indexes for each step of the evaluation, i.e. after returning the first set of documents use index again to match against "status":200 instead of performing the document scan for all subsequent elements.
It would greatly enhance the effectiveness of performing queries on arbitrary number of embedded array elements.
- duplicates
-
SERVER-1000 $all with query optimizer
- Closed