-
Type: Question
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: 4.0.9
-
Component/s: Aggregation Framework
-
None
We've been working on modeling bidirectional relations in MongoDB. We use the common approach of storing ids of related documents in specific fields, e.g.:
"users" {_id: "user1", items: ["item1"], name: "John"} "items" {_id: "item1", owner: "user1", type: "blanket"}
We've created a GraphQL server that uses the Aggregation Framework to perform lookups and resolve e.g. such a query:
{ users { id name items { id type } } }
to such a response:
{ "users": { "id": "user1", "name": "John", "items": [ { "id": "item1", "type": "blanket" } ] } }
Our reasoning was that using a single aggregation pipeline even for complex, nested queries will easily outperform sequential finds on multiple collections with all their associated round-trips and subsequent result manipulation.
Using the example above, if we wanted to fetch some users along with all their items without leveraging the Aggregation Framework, we would have to first query the "users" collection and get the list of associated item ids, then query the "items" collection using those ids and finally connect users and items according to their relationships.
The Aggregation Framework allows us to define all those tasks in a single pipeline and delegate their completion to MongoDB. The only problem is that it's much slower than the find-based approach and the performance gap only grows as we add more complexity. Here's a link to a simple benchmark, executed in the MongoDB, console that illustrates the problem:
https://gist.github.com/alethes/369faf2e1c5a83f4fbcaca9001dbd501
Sample results on my system are:
Aggregation 1: 387 ms Aggregation 2: 103 ms Find: 23 ms
The performance characteristics are almost identical when we use Node.js or C++ drivers.
What could cause such discrepancies? Shouldn't a single, internal aggregation outperform a sequence of 3 find requests along with some JS data manipulation? Are there any issues in our pipeline that lead to slower execution?
- duplicates
-
SERVER-41171 MongoDB $lookup using "$expr" is slow compared to simple $lookup
- Backlog
- is related to
-
SERVER-32549 allow $expr with $in expression use a non-multikey index
- Open
-
SERVER-37470 Lookup sub-pipeline is not using index with the $in operator
- Backlog
-
SERVER-34927 allow localField and foreignField with more expressive $lookup pipeline
- Closed