Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Query Integration
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

In a sharded environment, the throughput of an aggregation {$match: {_id: x}} is slower than the throughput of the equivalent find command by a factor of about 25X. I've attached flamegraphs to compare those 2 workloads.

In the agg case, 91% of the time is spent in cluster_aggregation_planner::getCollationAndUUID since it has to execute a remote call to the primary shard to retrieve that metadata. We believe that, at least in cases when we're only parsing the pipeline for the sake of query stats, the uuid is optional and the collation can be a default empty object. We should try to avoid that call so the cluster_aggregate IDHack isn't bottlenecked there.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

flamegraph_agg_idhack.svg
Oct 30 2023 03:23:28 PM UTC
1.23 MB
Will Buerger
flamegraph_find_idhack.svg
Oct 30 2023 03:23:30 PM UTC
2.19 MB
Will Buerger

duplicates

SERVER-80145 Avoid explicit callers of ChunkManager::dbPrimary() when doing shard targeting in agg code

Closed

is depended on by

SERVER-85082 M2 Performance Report

Closed

Assignee:: [DO NOT USE] Backlog - Query Integration
Reporter:: Will Buerger
Participants:: [DO NOT USE] Backlog - Query Integration, Will Buerger
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Oct 30 2023 03:26:05 PM UTC
Updated:: Nov 06 2023 03:15:31 PM UTC
Resolved:: Nov 06 2023 03:15:31 PM UTC

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates