Loading...

XML

Word

Printable

JSON

Type: Question
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 2.6.4
Component/s: Querying
Labels:
None

Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

I have a large collection of documents that look like:

// Example Document 1
{
  _id: 1,
  authorized_advertisers: [1,2,3],
  timestamp: 1406865600
}

// Example Document 2
{
  _id: 2,
  authorized_advertisers: [1],
  timestamp: 1406865600
}
// Example Document 3
{
  _id: 3,
  authorized_advertisers: [],
  timestamp: 1406865600
}

I have indexes on the following two indexes defined on this collection (among others):

{
		"v" : 1,
		"key" : {
			"authorized_advertisers" : 1
		},
		"name" : "authorized_advertisers_1",
		"ns" : "cmply_app_engage_production.social_engagements",
		"sparse" : true
	}

{
		"v" : 1,
		"key" : {
			"authorized_advertisers" : 1,
			"timestamp" : 1
		},
		"name" : "authorized_advertisers_1_timestamp_1",
		"ns" : "cmply_app_engage_production.social_engagements"
	}

When I execute queries against this collection, a larger than (seemingly) necessary number of documents are scanned to produce the results. If I provide a hint to the query engine to use the authorized_advertisers_1_timestamp_1 index, the lower bound for the timestamp seems to be ignored.

Criteria

{
	"authorized_advertisers" : ObjectId("4e6906f2dd06fd1f50000007"),
	"timestamp" : {
		"$lte" : 1408371025,
		"$gt" : 1406865600
	}
}

Query using $hint

db.social_engagements.find(q).hint("authorized_advertisers_1_timestamp_1").explain()
{
	"cursor" : "BtreeCursor authorized_advertisers_1_timestamp_1",
	"isMultiKey" : true,
	"n" : 148823,
	"nscannedObjects" : 1274509,
	"nscanned" : 1274509,
	"nscannedObjectsAllPlans" : 1274509,
	"nscannedAllPlans" : 1274509,
	"scanAndOrder" : false,
	"indexOnly" : false,
	"nYields" : 9959,
	"nChunkSkips" : 0,
	"millis" : 6447,
	"indexBounds" : {
		"authorized_advertisers" : [
			[
				ObjectId("4e6906f2dd06fd1f50000007"),
				ObjectId("4e6906f2dd06fd1f50000007")
			]
		],
		"timestamp" : [
			[
				-Infinity,
				1408371025
			]
		]
	},
	"server" : "mydb.server.com:27017",
	"filterSet" : false
}

Query withOUT $hint

db.social_engagements.find(q).explain()
{
	"cursor" : "BtreeCursor authorized_advertisers_1",
	"isMultiKey" : true,
	"n" : 148823,
	"nscannedObjects" : 1275405,
	"nscanned" : 1275405,
	"nscannedObjectsAllPlans" : 1275975,
	"nscannedAllPlans" : 1275976,
	"scanAndOrder" : false,
	"indexOnly" : false,
	"nYields" : 9969,
	"nChunkSkips" : 0,
	"millis" : 7162,
	"indexBounds" : {
		"authorized_advertisers" : [
			[
				ObjectId("4e6906f2dd06fd1f50000007"),
				ObjectId("4e6906f2dd06fd1f50000007")
			]
		]
	},
	"server" : "mydb.server.com:27017",
	"filterSet" : false
}

In each case, the same number of documents is returned. I don't understand why the lower bound for the timestamp is ignored, this would seem like it would reduce the number of documents the query would need to scan (in the collection or in the index) to produce a result.

I've also tried having the index have the timestamp attribute as the first key in the index, but performance was significantly worse than my current configuration.

These same criteria are used in Aggregation Framework queries, too. The queries take on average 7-10s to complete.

duplicates

SERVER-15086 Allow for efficient range queries over non-array fields in multikey indices

Closed

Assignee:: Unassigned
Reporter:: Michael De Lorenzo
Participants:: Michael De Lorenzo, Thomas Rueckstiess
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Aug 18 2014 07:31:46 PM UTC
Updated:: Sep 03 2014 08:45:00 PM UTC
Resolved:: Sep 03 2014 08:45:00 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates