-
Type: Bug
-
Resolution: Unresolved
-
Priority: Critical - P2
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Services & Integrations
-
ALL
When evergreen hit idle timeout for a test it will send a SIGABRT signal to all the mongo processes (mongos/mongod).
The mongo processes will then print the received singal:
[j1:s0:prim] | 2024-05-13T07:29:58.037+00:00 F CONTROL 6384300 [S] [initandlisten] "Writing fatal message","attr":{"message":"Got signal: 3 (Quit).
And additionally will also print all the current stack traces.
In this scenario, evergreen will categorize the task/tests as follows:
- The tasks will be marked as "Tasked timed out".
- The test will be marked as "Failed".
- The associated BFGs will be marked with "Server crash" severity. I believe this is because the log analyzer find the quit stack traces.
This is the same we would do for a real server crash. Thus, currently is very complicated to distinguish a BFGs that failed due to reaching the idle timeout versus a BFG that failed do to a server crash.
In order to differentiate the two I would suggest that in case the test times out due to reaching the hidle timeout we should have the following:
- The tasks should be marked as "Test timed out"
- The tests should be markes as "Test timed out" as well.
- The associated BFGs should be marked as "Server hang" or at least not marked as Server Crash.
This is an example of BFG that timed out and was wrongly markes as "server crash"
- is related to
-
SERVER-87332 Investigate changing resmoke to use SIGABRT instead of SIGQUIT for {T/A}SAN variants
- Closed