Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-73944

Log an explicit, searchable message that test harness is killing the test

    • Type: Icon: Improvement Improvement
    • Resolution: Fixed
    • Priority: Icon: Minor - P4 Minor - P4
    • 7.2.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • Server Development Platform
    • Fully Compatible
    • DAG 2023-10-16, DAG 2023-10-30
    • 2

      Can the test harness please log an explicit and informative message that it is about to kill the test processes before doing so. The current behavior is not easily recognizable as being an intentional kill from the test harness and can easily be mistaken for a crash of the server.

      Example from BF-27450:

      Writing fatal message
      message:
      ExternalRecordStoreTest
      NamedPipeMultiplePipes4
      Writing fatal message
      message: Got signal: 6 (Abort trap: 6).
      mongo::stack_trace_detail::(anonymous namespace)::getStackTraceImpl(mongo::stack_trace_detail::(anonymous namespace)::Options const&)
      mongo::printStackTrace()
      abruptQuit
      _sigtramp
      __srefill1
      __fread
      fread
      std::__1::basic_filebuf<char, std::__1::char_traits<char> >::underflow()
      std::__1::basic_streambuf<char, std::__1::char_traits<char> >::uflow()
      std::__1::basic_streambuf<char, std::__1::char_traits<char> >::xsgetn(char*, long)
      std::__1::basic_istream<char, std::__1::char_traits<char> >::read(char*, long)
      mongo::NamedPipeInput::doRead(char*, int)
      mongo::InputStream<mongo::NamedPipeInput>::readBytes(int, char*)
      mongo::MultiBsonStreamCursor::nextFromCurrentStream()
      mongo::MultiBsonStreamCursor::next()
      mongo::UnitTest_SuiteNameExternalRecordStoreTestTestNameNamedPipeMultiplePipes4::_doTest()
      mongo::unittest::Test::run()
      

      I am told that I am not the only person who has been fooled by this. This looked to me like the reason the test timed out was because the server had crashed and stack dumped and therefore stopped making progress, but the reality was that the server had stopped making progress an hour ago and then the test harness sent "kill -6" to abort the test.

      A message something like the following would be helpful to avoid time wasted investigating the wrong thing, and also make it easier for humans using Parsley to find the failure (currently timed out tests do not have the word "timeout" anywhere in the logs):

      TEST TIMEOUT FAILURE ABORT: Aborting the test via "kill -6" because it has not made progress for one hour. 

       

            Assignee:
            mikhail.shchatko@mongodb.com Mikhail Shchatko
            Reporter:
            kevin.cherkauer@mongodb.com Kevin Cherkauer
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: