Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Duplicate
Priority: Critical - P2
Fix Version/s: None
Affects Version/s: 2.6.12, 3.0.15, 3.2.17, 3.4.9, 3.5.13
Component/s: Geo, Querying
Labels:
None

Assigned Teams:

Query
Operating System:
ALL
Confidence Status:
None
Work Order:
0

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

When indexing array fields, most indexes contain one key per element of the array, hence the terminology "multikey indexes" (documented here). However, compound "2d" indexes use a different format for the non-2d fields. Namely, all of the array elements are stored in a single key whose value is itself an array. Consider the following example:

> db.c.drop()
false
> db.c.createIndex({a: "2d", "b.c": 1})
{
	"createdCollectionAutomatically" : true,
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"ok" : 1
}
> db.c.insert({a: [0, 0], b: [{c: 1}, {c: 2}]})
WriteResult({ "nInserted" : 1 })
> db.c.find().hint({a: "2d", "b.c": 1}).returnKey()
{ "a" : BinData(128,"wAAAAAAAAAA="), "b.c" : [ 1, 2 ] }

As you can see, the inserted document leads to just a single index key. The value of this index key for the "b.c" field is the array [1, 2].

Because of this index format, predicates against the trailing fields of an index are not used to generate bounds against the index. A predicate like {"b.c": {$eq: 2}} would normally result in point lookup in the index for the value 2. However, this would incorrectly miss the above document, because the index key value is the entire array [1, 2]. Instead, the predicate is attached as a filter to the IXSCAN stage, as you can see from the explain of the query below:

> db.c.find({a: {$geoWithin: {$center: [[0, 0], 1]}}, "b.c": {$eq: 2}}).explain()
{
	"queryPlanner" : {
...
			"inputStage" : {
				"stage" : "IXSCAN",
				"filter" : {
					"b.c" : {
						"$eq" : 2
					}
				},
				"keyPattern" : {
					"a" : "2d",
					"b.c" : 1
				},
				"indexName" : "a_2d_b.c_1",
				"isMultiKey" : false,
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"a" : [
						"[BinData(128, 3FFF300000000000), BinData(128, 3FFF3FFFFFFFFFFF)]",
						"[BinData(128, 3FFF400000000000), BinData(128, 3FFF7FFFFFFFFFFF)]",
						"[BinData(128, 3FFF800000000000), BinData(128, 3FFFBFFFFFFFFFFF)]",
						"[BinData(128, 3FFFC00000000000), BinData(128, 3FFFFFFFFFFFFFFF)]",
						"[BinData(128, 6AAA000000000000), BinData(128, 6AAA3FFFFFFFFFFF)]",
						"[BinData(128, 6AAA600000000000), BinData(128, 6AAA6FFFFFFFFFFF)]",
						"[BinData(128, 6AAA800000000000), BinData(128, 6AAABFFFFFFFFFFF)]",
						"[BinData(128, 6AAAC00000000000), BinData(128, 6AAAFFFFFFFFFFFF)]",
						"[BinData(128, 9555000000000000), BinData(128, 95553FFFFFFFFFFF)]",
						"[BinData(128, 9555400000000000), BinData(128, 95557FFFFFFFFFFF)]",
						"[BinData(128, 9555900000000000), BinData(128, 95559FFFFFFFFFFF)]",
						"[BinData(128, 9555C00000000000), BinData(128, 9555FFFFFFFFFFFF)]",
						"[BinData(128, C000000000000000), BinData(128, C0003FFFFFFFFFFF)]",
						"[BinData(128, C000400000000000), BinData(128, C0007FFFFFFFFFFF)]",
						"[BinData(128, C000800000000000), BinData(128, C000BFFFFFFFFFFF)]",
						"[BinData(128, C000C00000000000), BinData(128, C000CFFFFFFFFFFF)]"
					],
					"b.c" : [
						"[MinKey, MaxKey]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
...
}

However, this is simply incorrect for certain kinds of queries due to their matching behavior over arrays. For instance, queries which check equality against arrays may return spurious results:

> db.c.find({a: {$geoWithin: {$center: [[0, 0], 1]}}, "b.c": [1, 2]});
{ "_id" : ObjectId("59d7c44bf098d066aff4d4b6"), "a" : [ 0, 0 ], "b" : [ { "c" : 1 }, { "c" : 2 } ] }

Similarly, $type:"array" queries may return spurious results:

> db.c.find({a: {$geoWithin: {$center: [[0, 0], 1]}}, "b.c": {$type: "array"}});
{ "_id" : ObjectId("59d7c44bf098d066aff4d4b6"), "a" : [ 0, 0 ], "b" : [ { "c" : 1 }, { "c" : 2 } ] }

You can see that these results are spurious by issuing the same queries without the $geoWithin predicate and observing that the result set is empty:

> db.c.find({"b.c": [1, 2]});
> db.c.find({"b.c": {$type: "array"}});

This problem is closely related to that reported in ~~SERVER-21011~~. However, this one does not affect "text" indexes, because the trailing fields of "text" indexes may not contain arrays. The fix for this problem is also likely to be distinct from that for ~~SERVER-21011~~.

duplicates

SERVER-21011 Certain queries against compound 2d/text indexes are incorrectly covered, return incorrect results

Closed

is related to

SERVER-23909 Allow "2d" indices to accept a collation

Backlog

SERVER-24095 Set a feature bit in the KVCatalog when tracking path-level multikey info becomes enabled for an index

Closed

SERVER-15086 Allow for efficient range queries over non-array fields in multikey indices

Closed

related to

SERVER-31530 $elemMatch over trailing field of a 2d index can miss results

Closed

Assignee:: Backlog - Query Team (Inactive)
Reporter:: David Storch
Participants:: Backlog - Query Team, David Storch
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Oct 06 2017 06:15:01 PM UTC
Updated:: Dec 06 2022 03:49:57 AM UTC
Resolved:: Oct 13 2017 10:13:07 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates