-
Type: Improvement
-
Resolution: Works as Designed
-
Priority: Major - P3
-
None
-
Affects Version/s: 3.6.8
-
Component/s: Aggregation Framework, Querying
-
None
-
(copied to CRM)
Hi,
I've a performance issue with a collection with almost 4M documents, because the find (or $match on aggregation) is executing a FETCH when I use the $elemmatch. Even if the index is working well and returning only the correct documents.
The FETCH is not necessary because on version 3.6 the multiKeyPaths is working and makes the $elemmatch query to filter well on date ranges. So I think MongoDB should avoid executing the FETCH since it causes a huge performance drop.
here is an example of the query planner:
{ "queryPlanner" : { "plannerVersion" : 1, "namespace" : "mp.userdocuments", "indexFilterSet" : false, "parsedQuery" : { "$and" : [ { "userActs" : { "$elemMatch" : { "publishedDate2" : { "$elemMatch" : { "$lte" : ISODate("2018-10-01T18:31:48.000Z"), "$gte" : ISODate("2017-01-01T18:31:48.000Z") } } } } }, { "isArchived" : { "$eq" : false } }, { "user" : { "$eq" : ObjectId("556efbaaef2541ec0d80a1ba") } } ] }, "winningPlan" : { "stage" : "FETCH", "filter" : { "userActs" : { "$elemMatch" : { "publishedDate2" : { "$elemMatch" : { "$lte" : ISODate("2018-10-01T18:31:48.000Z"), "$gte" : ISODate("2017-01-01T18:31:48.000Z") } } } } }, "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "user" : 1, "isArchived" : 1, "userActs.publishedDate2" : 1 }, "indexName" : "user_1_isArchived_1_userActs.publishedDate2_1", "isMultiKey" : true, "multiKeyPaths" : { "user" : [], "isArchived" : [], "userActs.publishedDate2" : [ "userActs", "userActs.publishedDate2" ] }, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 2, "direction" : "forward", "indexBounds" : { "user" : [ "[ObjectId('556efbaaef2541ec0d80a1ba'), ObjectId('556efbaaef2541ec0d80a1ba')]" ], "isArchived" : [ "[false, false]" ], "userActs.publishedDate2" : [ "[new Date(1483295508000), new Date(1538418708000)]" ] } } }, "rejectedPlans" : [] }, "executionStats" : { "executionSuccess" : true, "nReturned" : 13033, "executionTimeMillis" : 175, "totalKeysExamined" : 15228, "totalDocsExamined" : 13033, "executionStages" : { "stage" : "FETCH", "filter" : { "userActs" : { "$elemMatch" : { "publishedDate2" : { "$elemMatch" : { "$lte" : ISODate("2018-10-01T18:31:48.000Z"), "$gte" : ISODate("2017-01-01T18:31:48.000Z") } } } } }, "nReturned" : 13033, "executionTimeMillisEstimate" : 152, "works" : 15229, "advanced" : 13033, "needTime" : 2195, "needYield" : 0, "saveState" : 119, "restoreState" : 119, "isEOF" : 1, "invalidates" : 0, "docsExamined" : 13033, "alreadyHasObj" : 0, "inputStage" : { "stage" : "IXSCAN", "nReturned" : 13033, "executionTimeMillisEstimate" : 41, "works" : 15229, "advanced" : 13033, "needTime" : 2195, "needYield" : 0, "saveState" : 119, "restoreState" : 119, "isEOF" : 1, "invalidates" : 0, "keyPattern" : { "user" : 1, "isArchived" : 1, "userActs.publishedDate2" : 1 }, "indexName" : "user_1_isArchived_1_userActs.publishedDate2_1", "isMultiKey" : true, "multiKeyPaths" : { "user" : [], "isArchived" : [], "userActs.publishedDate2" : [ "userActs", "userActs.publishedDate2" ] }, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 2, "direction" : "forward", "indexBounds" : { "user" : [ "[ObjectId('556efbaaef2541ec0d80a1ba'), ObjectId('556efbaaef2541ec0d80a1ba')]" ], "isArchived" : [ "[false, false]" ], "userActs.publishedDate2" : [ "[new Date(1483295508000), new Date(1538418708000)]" ] }, "keysExamined" : 15228, "seeks" : 1, "dupsTested" : 15228, "dupsDropped" : 2195, "seenInvalidated" : 0 } } }, "serverInfo" : { "host" : "ML-Lenovo", "port" : 27017, "version" : "3.6.8", "gitVersion" : "6bc9ed599c3fa164703346a22bad17e33fa913e4" }, "ok" : 1 }
So, as you can see, from 23K documents (for this user id), the filter correctly returns 13033 rows, and then the FETCH is executed and also returns 13033, because it is doing the same filter again, so the result will not be different.
Here is an example of a document:
{ "_id" : ObjectId("5b4d188ca958130c528f21ac"), "user" : ObjectId("556efbaaef2541ec0d80a1ba"), "isArchived" : false, "userActs" : [ { "publishedDate2" : [ ISODate("2016-02-15T05:00:00.000Z") ] }, { "publishedDate2" : [ ISODate("2016-03-07T05:00:00.000Z") ] }, { "publishedDate2" : [ ISODate("2016-10-04T04:00:00.000Z") ] }, { "publishedDate2" : [ ISODate("2016-10-13T04:00:00.000Z") ] }, { "publishedDate2" : [ ISODate("2017-01-04T05:00:00.000Z") ] }, { "publishedDate2" : [ ISODate("2018-03-05T05:00:00.000Z") ] } ], "__v" : 6 }
The index is:
{ "user" : 1, "isArchived" : 1, "userActs.publishedDate2" : 1 }
The execution of the FETCH stage is killing the performance.
Can we fix this and avoiding the FETCH stage when the index isMultiKey and the fields on $elemMatch are covered by the index and are in the multiKeyPaths?
Otherwise, I guess, I'll need to change my structure to not use array? (not good since we would need to duplicate a lot of data).
Thanks
- related to
-
SERVER-35223 $elemMatch forces mongo to fetch documents instead of using COUNT_SCAN
- Closed