Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Works as Designed
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- cs-bf-external

Assigned Teams:

Service Arch
Operating System:
ALL
Sprint:
Workload Scheduling 2024-07-22
Linked BF Score:
0
Confidence Status:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

BF-33912's lone BFG at the time of writing appears to have been caused by some replication error in a TSAN variant (which is quite slow) leading to a host being down for longer than usual (see the comments here for details).

This caused the client threads to receive a mix of "Connection reset by peer," "Connection refused," and "HostUnreachable" errors, but only HostUnreachable is considered a retriable error that will not consume the retry limit.

In suites where we kill/terminate shard processes, it should be expected to receive network errors more frequently (and that they should be transient).

Assignee:: George Wangensteen (Inactive)

Reporter:: Brett Nawrocki

Participants:: Brett Nawrocki, George Wangensteen

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: Jul 03 2024 06:30:11 PM UTC

Updated:: Jul 15 2024 07:30:45 PM UTC

Resolved:: Jul 15 2024 07:30:44 PM UTC

Confidence Status Last Update:: 15/Jul/24 5:39 PM

GA Target Date:: None

Public Preview Target Date:: None

Private Preview Target Date:: None

Experiment Target Date:: None

Details

Description

Attachments

Activity

People

Dates