-
Type: Improvement
-
Resolution: Unresolved
-
Priority: Unknown
-
None
-
Component/s: Performance Benchmarking
-
None
see note on DRIVERS-2557
Summary
Drivers should ensure any performance testing, including but not limited to the driver spec performance benchmarks:
- uses a dedicated distro in evergreen (to avoid fluctuations in performance due to distro changes)
- uses a patch-pinned server version for integration performance tests (to avoid fluctuations in performance due to server performance profile changes)
- utilizes the performance analytics backend for change point detection via the perf.send command (the tooling uses sophisticated algorithms to detect real points of change in performance and minimize noise)
- sets up actionable alerting based on performance results
Motivation
Who is the affected end user?
This should allow driver teams to get ahead of performance regressions
How does this affect the end user?
N/A
How likely is it that this problem or use case will occur?
N/A
If the problem does occur, what are the consequences and how severe are they?
N/A
Is this issue urgent?
The sooner the drivers implement the standardized architecture, the sooner they can start building up a history of reliable performance data.
Is this ticket required by a downstream team?
No
Is this ticket only for tests?
Yes
Acceptance Criteria
This ticket can go directly into teams implementing. Depending on the team's existing setup, teams may choose to create an epic to address different aspects of the work outlined here.
- The dedicated performance distro is rhel90-dbx-perf-large (others can be created if needed in coordination with the build team)
- Driver evergreen tools exposes the patch pinned v6 server version 6.0.6 that can be referenced via the `v6.0-perf` version alias (the analogous perf-stable v7 version will be added later)
- To use the performance analytics backend, it is sufficient to invoke the perf.send command in the evergreen run:
- Docs on perf.send: https://docs.devprod.prod.corp.mongodb.com/evergreen/Project-Configuration/Project-Commands#perfsend
- Instructions for configuring your team permissions and projects to use the associated monitoring and examples for perf.send: https://github.com/10gen/performance-tooling-docs/blob/main/getting_started/performance_monitoring_setup.md
- Note: improved docs for the perf.send format will be available upon the completion of EVG-17598 (see PR)
- There will be a "trend charts" tab for the evergreen task where you can view the results provided you have enabled the monitoring for the project (Node driver example)
- Docs on perf.send: https://docs.devprod.prod.corp.mongodb.com/evergreen/Project-Configuration/Project-Commands#perfsend
- Notifications for change point detection can be set up in kanopy splunk and sent directly to slack (there are many additional notification options available)
- NOTE: The change point detection works retrospectively; so as new data flows in, it could detect statistically significant changes in the distribution which did not exist before and create change points in the past. Usually if it is a large/prominent and sustained change, it gets detected within a few days of the commit date. In more noisy time series / less prominent changes, it could take a while before a change point gets detected on a commit. The sample query below does not limit the allowed date range of the commits for change point detection, however, this is something that can be added to the query if the notifications for commits too far in the past get too noisy. Here's an example query limiting the search range to 60 days:
message="New change point detected." index="server-tig-prod" | spath output=project path="change_point.time_series_info.project" | search project IN ("mongo-node-driver-next", "node-bson") | spath output=run_date path="change_point.evg_create_date" | eval days_since=(now()-strptime(run_date, "%Y-%m-%dT%H:%M:%S%:z"))/86400 | search days_since < 60
- NOTE: The change point detection works retrospectively; so as new data flows in, it could detect statistically significant changes in the distribution which did not exist before and create change points in the past. Usually if it is a large/prominent and sustained change, it gets detected within a few days of the commit date. In more noisy time series / less prominent changes, it could take a while before a change point gets detected on a commit. The sample query below does not limit the allowed date range of the commits for change point detection, however, this is something that can be added to the query if the notifications for commits too far in the past get too noisy. Here's an example query limiting the search range to 60 days:
-
- NOTE #2: All change points can be triaged, linked to jira tickets, and marked as true or false positives in the build baron UI: https://performance-monitoring-and-analysis.server-tig.prod.corp.mongodb.com/baron (sample filter for the node project); however, this UI is somewhat clunky and, considering the expected volume of change points for a typical driver project, may not be the most efficient way for drivers to act on true positives. Therefore, drivers may choose to implement their own process of triaging change points without formally marking each one in the build baron system.
- NOTE #3: Remember to set appropriate read/write permissions for your alert. Read permissions can be safely set to everyone. However, in order to set your custom alert edit permissions to just your team, your team's mana group will need to be mapped to a kanopy splunk role; if your team does not appear in the role list, you will need to file an IT ticket to request it to be added.
Sample splunk query for a single evergreen project:
message="New change point detected." index="server-tig-prod" | spath "change_point.time_series_info.project" | search "change_point.time_series_info.project"="mongo-node-driver-next"
Sample splunk query for multiple evergreen projects:
message="New change point detected." index="server-tig-prod" | spath "change_point.time_series_info.project" | search "change_point.time_series_info.project" IN ("mongo-node-driver-next", "node-bson")
Sample notification message:
New change point from `$result.change_point.commit_date$` *$result.change_point.message$* (<https://spruce.mongodb.com/task/$result.change_point.task_id$/trend-charts|CI Link>) ``` Project: $result.change_point.time_series_info.project$ Variant: $result.change_point.time_series_info.variant$ Task: $result.change_point.time_series_info.task$ Test: $result.change_point.time_series_info.test$ Measurement: $result.change_point.time_series_info.measurement$ Percent change: $result.change_point.percent_change$ Repo: $result.change_point.repo_full_name$ Branch: $result.change_point.branch$ ```
For greater visibility, you may want to add the repo and branch (or any other fields you want to highlight) into the "Fields" box below the "Message" in the alert setup, e.g.:
change_point.repo_full_name,change_point.branch
- is related to
-
DRIVERS-2779 Standardize performance benchmark reporting metrics
- Backlog
-
DRIVERS-2957 Add guidelines for testing performance benchmarks on official and pre-release server versions
- Ready for Work
- related to
-
CSHARP-4670 Implement Drivers Performance Benchmarking Testing Specification
- Closed
-
DRIVERS-2557 Integrate with the Server Performance Project
- Closed
- split to
-
RUBY-3290 Standardize performance testing infrastructure
- Backlog
-
CDRIVER-4676 Standardize performance testing infrastructure
- Closed
-
CSHARP-4713 Standardize performance testing infrastructure
- Closed
-
CXX-2710 Standardize performance testing infrastructure
- Closed
-
GODRIVER-2898 Standardize performance testing infrastructure
- Closed
-
JAVA-5065 Standardize performance testing infrastructure
- Closed
-
MOTOR-1149 Standardize performance testing infrastructure
- Closed
-
NODE-5440 Standardize performance testing infrastructure
- Closed
-
PHPLIB-1187 Standardize performance testing infrastructure
- Closed
-
PYTHON-3823 Standardize performance testing infrastructure
- Closed
-
RUST-1698 Standardize performance testing infrastructure
- Closed