Uploaded image for project: 'Drivers'
  1. Drivers
  2. DRIVERS-3084

Retry any error that prevents getting a parseable response from the Atlas API

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Unknown Unknown
    • None
    • Component/s: Astrolabe, Atlas Testing
    • None
    • Not Needed

      Summary

      The Astrolabe Atlas API client sometimes doesn't retry when there are errors calling the API. Cases we've observed that caused a task failure include:

      • The API returned an incomplete JSON blob and Astrolabe failed to parse it.
      • The API returned an error message that indicated an intermittent API error, but misused the HTTP status code 400.

      There is currently logic that retries API requests, but it only retries if there is an error getting a response. Instead, we should retry all requests that don't return a parseable API message, independent of HTTP code. There is a risk that we could retry a request that will never succeed, but Astrolabe uses static concurrency and generally doesn't cause a ton of API requests, so the possibility of unnecessary retries are better than unnecessary failures.

      Motivation

      Who is the affected end user?

      Astrolabe maintainers and DBX devs.

      How does this affect the end user?

      Astrolabe maintainers need to manually restart jobs, which takes up time. DBX devs have to sift through the noise of intermittent failures, which obscures the real test data.

      How likely is it that this problem or use case will occur?

      The cloud-qa Atlas env is intermittently unstable. The API failures tend to happen a few times a month on average.

      If the problem does occur, what are the consequences and how severe are they?

      Wasted time and obscured test results.

      Is this issue urgent?

      No.

      Is this ticket required by a downstream team?

      No.

      Is this ticket only for tests?

      No.

      Acceptance Criteria

      What specific requirements must be met to consider the design phase complete?

            Assignee:
            jib.adegunloye@mongodb.com Jib Adegunloye
            Reporter:
            matt.dale@mongodb.com Matt Dale
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: