Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Fixed
Priority: Major - P3
Fix Version/s: WT10.0.0, 4.9.0, 4.4.3
Affects Version/s: None
Component/s: None
Labels:
- dev-prod

Story Points:
3
Sprint:
Storage - Ra 2020-11-16

Triaging and diagnosing hang failures in automated Evergreen tests should be easier. After determining that a test is hung, Evergreen should automatically collect and report data that will help with the initial triage and diagnosis of the problem. Ideally we might collect:

What test programs were running at the time of the hang.
The WiredTiger directory for those tests (I believe we already keep this for all tests)
Cores of the hung process(es), to help engineers determine why they were hung
Stack traces from the hung processes, to include in the Evergreen logs to facilitate triage.

There is probably other stuff that would be useful as well.

MongoDB's resmoke.py includes a hang-analyzer that they use for this purpose, buildscripts/resmokelib/commands/hang_analyzer.py. We might be able to use it as the basis for a WT hang analyzer, or simply steal it outright.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

image-2020-11-03-11-47-26-634.png
Nov 03 2020 12:47:27 AM UTC
19 kB
Luke Chen

causes

WT-6919 Windows cannot find the debug symbols - Hang analyzer.

Backlog

WT-6918 lldb cannot attach to processes in MacOS - Hang analyzer

Backlog

is related to

WT-5438 Investigate using the Mongo Evergreen hang analysis script in WiredTiger Evergreen

Closed

Assignee:: Ravi Giri

Reporter:: Keith Smith

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: Jun 19 2020 12:07:12 AM UTC

Updated:: Oct 29 2023 04:43:20 PM UTC

Resolved:: Nov 13 2020 05:54:28 AM UTC

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates