-
Type: Question
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: 2.6.4
-
Component/s: Querying
-
None
I have a large collection of documents that look like:
// Example Document 1 { _id: 1, authorized_advertisers: [1,2,3], timestamp: 1406865600 } // Example Document 2 { _id: 2, authorized_advertisers: [1], timestamp: 1406865600 } // Example Document 3 { _id: 3, authorized_advertisers: [], timestamp: 1406865600 }
I have indexes on the following two indexes defined on this collection (among others):
{ "v" : 1, "key" : { "authorized_advertisers" : 1 }, "name" : "authorized_advertisers_1", "ns" : "cmply_app_engage_production.social_engagements", "sparse" : true }
{ "v" : 1, "key" : { "authorized_advertisers" : 1, "timestamp" : 1 }, "name" : "authorized_advertisers_1_timestamp_1", "ns" : "cmply_app_engage_production.social_engagements" }
When I execute queries against this collection, a larger than (seemingly) necessary number of documents are scanned to produce the results. If I provide a hint to the query engine to use the authorized_advertisers_1_timestamp_1 index, the lower bound for the timestamp seems to be ignored.
{ "authorized_advertisers" : ObjectId("4e6906f2dd06fd1f50000007"), "timestamp" : { "$lte" : 1408371025, "$gt" : 1406865600 } }
db.social_engagements.find(q).hint("authorized_advertisers_1_timestamp_1").explain() { "cursor" : "BtreeCursor authorized_advertisers_1_timestamp_1", "isMultiKey" : true, "n" : 148823, "nscannedObjects" : 1274509, "nscanned" : 1274509, "nscannedObjectsAllPlans" : 1274509, "nscannedAllPlans" : 1274509, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 9959, "nChunkSkips" : 0, "millis" : 6447, "indexBounds" : { "authorized_advertisers" : [ [ ObjectId("4e6906f2dd06fd1f50000007"), ObjectId("4e6906f2dd06fd1f50000007") ] ], "timestamp" : [ [ -Infinity, 1408371025 ] ] }, "server" : "mydb.server.com:27017", "filterSet" : false }
db.social_engagements.find(q).explain() { "cursor" : "BtreeCursor authorized_advertisers_1", "isMultiKey" : true, "n" : 148823, "nscannedObjects" : 1275405, "nscanned" : 1275405, "nscannedObjectsAllPlans" : 1275975, "nscannedAllPlans" : 1275976, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 9969, "nChunkSkips" : 0, "millis" : 7162, "indexBounds" : { "authorized_advertisers" : [ [ ObjectId("4e6906f2dd06fd1f50000007"), ObjectId("4e6906f2dd06fd1f50000007") ] ] }, "server" : "mydb.server.com:27017", "filterSet" : false }
In each case, the same number of documents is returned. I don't understand why the lower bound for the timestamp is ignored, this would seem like it would reduce the number of documents the query would need to scan (in the collection or in the index) to produce a result.
I've also tried having the index have the timestamp attribute as the first key in the index, but performance was significantly worse than my current configuration.
These same criteria are used in Aggregation Framework queries, too. The queries take on average 7-10s to complete.
- duplicates
-
SERVER-15086 Allow for efficient range queries over non-array fields in multikey indices
- Closed