When indexing array fields, most indexes contain one key per element of the array, hence the terminology "multikey indexes" (documented here). However, compound "2d" indexes use a different format for the non-2d fields. Namely, all of the array elements are stored in a single key whose value is itself an array. Consider the following example:
> db.c.drop() false > db.c.createIndex({a: "2d", "b.c": 1}) { "createdCollectionAutomatically" : true, "numIndexesBefore" : 1, "numIndexesAfter" : 2, "ok" : 1 } > db.c.insert({a: [0, 0], b: [{c: 1}, {c: 2}]}) WriteResult({ "nInserted" : 1 }) > db.c.find().hint({a: "2d", "b.c": 1}).returnKey() { "a" : BinData(128,"wAAAAAAAAAA="), "b.c" : [ 1, 2 ] }
As you can see, the inserted document leads to just a single index key. The value of this index key for the "b.c" field is the array [1, 2].
Because of this index format, predicates against the trailing fields of an index are not used to generate bounds against the index. A predicate like {"b.c": {$eq: 2}} would normally result in point lookup in the index for the value 2. However, this would incorrectly miss the above document, because the index key value is the entire array [1, 2]. Instead, the predicate is attached as a filter to the IXSCAN stage, as you can see from the explain of the query below:
> db.c.find({a: {$geoWithin: {$center: [[0, 0], 1]}}, "b.c": {$eq: 2}}).explain() { "queryPlanner" : { ... "inputStage" : { "stage" : "IXSCAN", "filter" : { "b.c" : { "$eq" : 2 } }, "keyPattern" : { "a" : "2d", "b.c" : 1 }, "indexName" : "a_2d_b.c_1", "isMultiKey" : false, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 2, "direction" : "forward", "indexBounds" : { "a" : [ "[BinData(128, 3FFF300000000000), BinData(128, 3FFF3FFFFFFFFFFF)]", "[BinData(128, 3FFF400000000000), BinData(128, 3FFF7FFFFFFFFFFF)]", "[BinData(128, 3FFF800000000000), BinData(128, 3FFFBFFFFFFFFFFF)]", "[BinData(128, 3FFFC00000000000), BinData(128, 3FFFFFFFFFFFFFFF)]", "[BinData(128, 6AAA000000000000), BinData(128, 6AAA3FFFFFFFFFFF)]", "[BinData(128, 6AAA600000000000), BinData(128, 6AAA6FFFFFFFFFFF)]", "[BinData(128, 6AAA800000000000), BinData(128, 6AAABFFFFFFFFFFF)]", "[BinData(128, 6AAAC00000000000), BinData(128, 6AAAFFFFFFFFFFFF)]", "[BinData(128, 9555000000000000), BinData(128, 95553FFFFFFFFFFF)]", "[BinData(128, 9555400000000000), BinData(128, 95557FFFFFFFFFFF)]", "[BinData(128, 9555900000000000), BinData(128, 95559FFFFFFFFFFF)]", "[BinData(128, 9555C00000000000), BinData(128, 9555FFFFFFFFFFFF)]", "[BinData(128, C000000000000000), BinData(128, C0003FFFFFFFFFFF)]", "[BinData(128, C000400000000000), BinData(128, C0007FFFFFFFFFFF)]", "[BinData(128, C000800000000000), BinData(128, C000BFFFFFFFFFFF)]", "[BinData(128, C000C00000000000), BinData(128, C000CFFFFFFFFFFF)]" ], "b.c" : [ "[MinKey, MaxKey]" ] } } }, "rejectedPlans" : [ ] }, ... }
However, this is simply incorrect for certain kinds of queries due to their matching behavior over arrays. For instance, queries which check equality against arrays may return spurious results:
> db.c.find({a: {$geoWithin: {$center: [[0, 0], 1]}}, "b.c": [1, 2]}); { "_id" : ObjectId("59d7c44bf098d066aff4d4b6"), "a" : [ 0, 0 ], "b" : [ { "c" : 1 }, { "c" : 2 } ] }
Similarly, $type:"array" queries may return spurious results:
> db.c.find({a: {$geoWithin: {$center: [[0, 0], 1]}}, "b.c": {$type: "array"}}); { "_id" : ObjectId("59d7c44bf098d066aff4d4b6"), "a" : [ 0, 0 ], "b" : [ { "c" : 1 }, { "c" : 2 } ] }
You can see that these results are spurious by issuing the same queries without the $geoWithin predicate and observing that the result set is empty:
> db.c.find({"b.c": [1, 2]}); > db.c.find({"b.c": {$type: "array"}});
This problem is closely related to that reported in SERVER-21011. However, this one does not affect "text" indexes, because the trailing fields of "text" indexes may not contain arrays. The fix for this problem is also likely to be distinct from that for SERVER-21011.
- duplicates
-
SERVER-21011 Certain queries against compound 2d/text indexes are incorrectly covered, return incorrect results
- Closed
- is related to
-
SERVER-23909 Allow "2d" indices to accept a collation
- Backlog
-
SERVER-24095 Set a feature bit in the KVCatalog when tracking path-level multikey info becomes enabled for an index
- Closed
-
SERVER-15086 Allow for efficient range queries over non-array fields in multikey indices
- Closed
- related to
-
SERVER-31530 $elemMatch over trailing field of a 2d index can miss results
- Closed