Loading...

XML

Word

Printable

JSON

Type: Question
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 4.0.9
Component/s: Aggregation Framework
Labels:
None

Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

We've been working on modeling bidirectional relations in MongoDB. We use the common approach of storing ids of related documents in specific fields, e.g.:

"users"
{_id: "user1", items: ["item1"], name: "John"}

"items"
{_id: "item1", owner: "user1", type: "blanket"}

We've created a GraphQL server that uses the Aggregation Framework to perform lookups and resolve e.g. such a query:

{
  users {
    id
    name
    items {
      id
      type
    }
  }
}

to such a response:

{
  "users": {
    "id": "user1",
    "name": "John",
    "items": [
      {
        "id": "item1",
        "type": "blanket"
      }
    ]
  }
}

Our reasoning was that using a single aggregation pipeline even for complex, nested queries will easily outperform sequential finds on multiple collections with all their associated round-trips and subsequent result manipulation.

Using the example above, if we wanted to fetch some users along with all their items without leveraging the Aggregation Framework, we would have to first query the "users" collection and get the list of associated item ids, then query the "items" collection using those ids and finally connect users and items according to their relationships.

The Aggregation Framework allows us to define all those tasks in a single pipeline and delegate their completion to MongoDB. The only problem is that it's much slower than the find-based approach and the performance gap only grows as we add more complexity. Here's a link to a simple benchmark, executed in the MongoDB, console that illustrates the problem:

https://gist.github.com/alethes/369faf2e1c5a83f4fbcaca9001dbd501

Sample results on my system are:

Aggregation 1: 387 ms
Aggregation 2: 103 ms
Find: 23 ms

The performance characteristics are almost identical when we use Node.js or C++ drivers.

What could cause such discrepancies? Shouldn't a single, internal aggregation outperform a sequence of 3 find requests along with some JS data manipulation? Are there any issues in our pipeline that lead to slower execution?

duplicates

SERVER-41171 MongoDB $lookup using "$expr" is slow compared to simple $lookup

Backlog

is related to

SERVER-37470 Lookup sub-pipeline is not using index with the $in operator

Backlog

SERVER-32549 allow $expr with $in expression to use indexes

Closed

SERVER-34927 allow localField and foreignField with more expressive $lookup pipeline

Closed

Assignee:: Eric Sedor
Reporter:: James Daab
Participants:: Eric Sedor, James Daab
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Dec 31 2019 12:44:45 PM UTC
Updated:: Feb 19 2020 05:07:33 PM UTC
Resolved:: Feb 19 2020 05:07:33 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates