Uploaded image for project: 'Drivers'
  1. Drivers
  2. DRIVERS-2363

Catch all Astrolabe setup exceptions and mark them as setup failures instead of task failures

    • Type: Icon: Improvement Improvement
    • Resolution: Fixed
    • Priority: Icon: Unknown Unknown
    • None
    • Component/s: Astrolabe
    • None
    • Not Needed

      Summary

      Some exceptions that can happen during Astrolabe setup are not recognized as part of setup and are registered as task failures instead of setup failures. Update Astrolabe exception handling to catch all exceptions that can happen during setup and defer handling those errors to the "check-cloud-failure" command that runs after the main test run.

      Detailed Description

      The Atlas cluster setup happens during the run-one command, which is configured as type "test" in the Evergreen config (displays as "test failure"). It seems like we really want to check for cloud failure in the check-cloud-failure command, which is configured as type "setup" in the Evergreen config (displays as "setup failure"). Deferring cloud setup failure checking depends on this try/except block in the runner, which expects a very specific set of exception types and error messages. However, some of those HTTP timeout exceptions are thrown from HTTP calls in _init_ functions (e.g. here) and aren't caught by the try/except block.

      The exception handling block that attempts to defer errors caused by cloud setup to a following "setup"-type Evergreen command doesn't handle a lot of possible setup exceptions. We need to refactor the exception handling logic to catch all exceptions that can happen during initialization and cluster setup (e.g. by moving the try/except block to where the runner is initialized and called here).
       

      Motivation

      Who is the affected end user?

      People supporting Astrolabe. Possibly driver devs who are erroneously notified about drivers failures.

      How does this affect the end user?

      An Astrolabe build will fail with a task failure instead of a setup failure.

      How likely is it that this problem or use case will occur?

      Reasonably likely, especially if deployments to the "Cloud QA" Atlas environment are happening during an Astrolabe run.

      If the problem does occur, what are the consequences and how severe are they?

      It is confusing to someone trying to debug the Astrolabe build failure.

      Is this issue urgent?

      No.

      Is this ticket required by a downstream team?

      No.

      Is this ticket only for tests?

      Yes.

            Assignee:
            matt.dale@mongodb.com Matt Dale
            Reporter:
            matt.dale@mongodb.com Matt Dale
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: