-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
Fully Compatible
-
ALL
-
Service Arch 2022-05-16, Service Arch 2022-05-30
-
130
-
2
When an invariant fails, we end the process abruptly by calling std::abort. This raises SIGABRT, which we install our own signal handler for. The signal handler attempts to print the signal that was received, print a stack trace to log, and exit the process in a way where we try to ensure a core dump is taken (on windows, ensuring a core dump is taken involves extra work and cannot just be left to the OS). These stack traces and core dumps are very valuable in resolving BFs.
On Unix-like systems, it should be safe for an invariant to fail concurrently on different threads, as a signal handler will not be interrupted by further arrivals of the same signal it is currently handling, allowing the abort handler to run to completion. On Windows, however, the emulation of signals provides different behavior: when the first signal is received, before the user-provided handler is run, the handler is swapped out for SIG_DFL (the default handler). This means that while the handler runs, further arrivals of the same signal will instead be directed to the default handler, rather than the user provided one. In practice, this means that if a second SIGABRT arrives while the handler is running, the process may crash without the first handler running to completion, meaning that the crash will have no stack trace or core dumps available.
Potential solutions:
- We could get around this adding an atomic bool guard to make sure abort is only called once per process.
- Adding an atomic guard during failing an invariant (but after determining that the invariant failed).