-
Type: New Feature
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Testing Infrastructure
-
Fully Compatible
-
TIG 2017-12-18
The changes from EVG-1222 added an event to the "host logs" to indicate if the EC2 instance was terminated. This has been known to happen (most often on the Windows 2008R2 DEBUG builder) due to increases in the spot price. When the EC2 instance is terminated, a CTRL_SHUTDOWN_EVENT is sent to the mongod processes causing them to exit and for tests communicting with those servers to fail. Additionally, an IOError "Interrupted function call" exception is raised to indicate that waiting for the test queue to become empty was interrupted. This leads to a race where (1) resmoke.py may exit with a nonzero code and (2) the Evergreen agent runs the attach.results command to indicate the Evergreen task has failed prior to the EC2 being terminated.
We should instead capture the IOError exception and have resmoke.py exit with a specific return code. The "run tests" function in etc/evergreen.yml should then check that return code and defer causing the task to fail in a subsequent shell.exec command of type=system. See here for how this is done with existing Jepsen tasks.
- is depended on by
-
SERVER-35472 resmoke.py shouldn't fall back to stderr when logkeeper is unavailable
- Closed