Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-92120

network_error_and_txn_override.js Should Be More Accommodating of Network Errors in Kill Primary Tests

    • Type: Icon: Bug Bug
    • Resolution: Works as Designed
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Service Arch
    • ALL
    • Workload Scheduling 2024-07-22
    • 0

      BF-33912's lone BFG at the time of writing appears to have been caused by some replication error in a TSAN variant (which is quite slow) leading to a host being down for longer than usual (see the comments here for details).

      This caused the client threads to receive a mix of "Connection reset by peer," "Connection refused," and "HostUnreachable" errors, but only HostUnreachable is considered a retriable error that will not consume the retry limit.

      In suites where we kill/terminate shard processes, it should be expected to receive network errors more frequently (and that they should be transient).

            Assignee:
            george.wangensteen@mongodb.com George Wangensteen
            Reporter:
            brett.nawrocki@mongodb.com Brett Nawrocki
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: