Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-72860

Python exceptions in create_fixture_table() cause resmoke to incorrectly mark Evergreen tasks as setup failures

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Critical - P2 Critical - P2
    • 6.3.0-rc0
    • Affects Version/s: 5.0.0, 6.0.0, 6.2.0-rc6
    • Component/s: Testing Infrastructure
    • None
    • Server Development Platform
    • Fully Compatible
    • ALL
    • Hide
      python buildscripts/resmoke.py run --suite=sharding_jscore_passthrough --log=buildlogger jstests/core/query/all/all.js
      
      Unable to find source-code formatter for language: diff. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml
      diff --git a/buildscripts/resmokelib/logging/buildlogger.py b/buildscripts/resmokelib/logging/buildlogger.py
      index 1ff2689ea64..d1e89db4278 100644
      --- a/buildscripts/resmokelib/logging/buildlogger.py
      +++ b/buildscripts/resmokelib/logging/buildlogger.py
      @@ -266,10 +266,7 @@ class BuildloggerServer(object):
           def __init__(self):
               """Initialize BuildloggerServer."""
               tmp_globals = {}
      -        self.config = {}
      -        exec(
      -            compile(open(_BUILDLOGGER_CONFIG, "rb").read(), _BUILDLOGGER_CONFIG, 'exec'),
      -            tmp_globals, self.config)
      +        self.config = dict(username="u", password="p", builder="b", build_num="1")
      
               # Rename "slavename" to "username" if present.
               if "slavename" in self.config and "username" not in self.config:
      diff --git a/buildscripts/resmokelib/logging/flush.py b/buildscripts/resmokelib/logging/flush.py
      index 16335ef44ab..a2b32d5e9a2 100644
      --- a/buildscripts/resmokelib/logging/flush.py
      +++ b/buildscripts/resmokelib/logging/flush.py
      @@ -35,7 +35,7 @@ def stop_thread():
           _FLUSH_THREAD.signal_shutdown()
           # Wait for 1min instead of _FLUSH_THREAD.await_shutdown() because we can
           # sometimes wait indefinitely for a response, causing a task timeout.
      -    _FLUSH_THREAD.join(60)
      +    _FLUSH_THREAD.join(5)
      
           success = not _FLUSH_THREAD.is_alive()
           return success
      diff --git a/buildscripts/resmokelib/logging/handlers.py b/buildscripts/resmokelib/logging/handlers.py
      index 29292a3bdef..5ff2d068762 100644
      --- a/buildscripts/resmokelib/logging/handlers.py
      +++ b/buildscripts/resmokelib/logging/handlers.py
      @@ -192,6 +192,8 @@ class HTTPHandler(object):
               on the content type.
               """
      
      +        return dict(id="fake_id")
      +
               data = utils.default_if_none(data, [])
               data = json.dumps(data)
      
      diff --git a/buildscripts/resmokelib/testing/fixtures/shardedcluster.py b/buildscripts/resmokelib/testing/fixtures/shardedcluster.py
      index dddb01ca8d2..dbfdabf1018 100644
      --- a/buildscripts/resmokelib/testing/fixtures/shardedcluster.py
      +++ b/buildscripts/resmokelib/testing/fixtures/shardedcluster.py
      @@ -267,6 +267,7 @@ class ShardedClusterFixture(interface.Fixture):
               output = []
               for shard in self.shards:
                   output += shard.get_node_info()
      +        raise AttributeError("Intentionally raised")
               for mongos in self.mongos:
                   output += mongos.get_node_info()
               return output + self.configsvr.get_node_info()
      
      Show
      python buildscripts/resmoke.py run --suite=sharding_jscore_passthrough --log=buildlogger jstests/core/query/all/all.js Unable to find source-code formatter for language: diff. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml diff --git a/buildscripts/resmokelib/logging/buildlogger.py b/buildscripts/resmokelib/logging/buildlogger.py index 1ff2689ea64..d1e89db4278 100644 --- a/buildscripts/resmokelib/logging/buildlogger.py +++ b/buildscripts/resmokelib/logging/buildlogger.py @@ -266,10 +266,7 @@ class BuildloggerServer(object): def __init__(self): """Initialize BuildloggerServer." "" tmp_globals = {} - self.config = {} - exec( - compile(open(_BUILDLOGGER_CONFIG, "rb" ).read(), _BUILDLOGGER_CONFIG, 'exec' ), - tmp_globals, self.config) + self.config = dict(username= "u" , password= "p" , builder= "b" , build_num= "1" ) # Rename "slavename" to "username" if present. if "slavename" in self.config and "username" not in self.config: diff --git a/buildscripts/resmokelib/logging/flush.py b/buildscripts/resmokelib/logging/flush.py index 16335ef44ab..a2b32d5e9a2 100644 --- a/buildscripts/resmokelib/logging/flush.py +++ b/buildscripts/resmokelib/logging/flush.py @@ -35,7 +35,7 @@ def stop_thread(): _FLUSH_THREAD.signal_shutdown() # Wait for 1min instead of _FLUSH_THREAD.await_shutdown() because we can # sometimes wait indefinitely for a response, causing a task timeout. - _FLUSH_THREAD.join(60) + _FLUSH_THREAD.join(5) success = not _FLUSH_THREAD.is_alive() return success diff --git a/buildscripts/resmokelib/logging/handlers.py b/buildscripts/resmokelib/logging/handlers.py index 29292a3bdef..5ff2d068762 100644 --- a/buildscripts/resmokelib/logging/handlers.py +++ b/buildscripts/resmokelib/logging/handlers.py @@ -192,6 +192,8 @@ class HTTPHandler(object): on the content type. """ + return dict(id= "fake_id" ) + data = utils.default_if_none(data, []) data = json.dumps(data) diff --git a/buildscripts/resmokelib/testing/fixtures/shardedcluster.py b/buildscripts/resmokelib/testing/fixtures/shardedcluster.py index dddb01ca8d2..dbfdabf1018 100644 --- a/buildscripts/resmokelib/testing/fixtures/shardedcluster.py +++ b/buildscripts/resmokelib/testing/fixtures/shardedcluster.py @@ -267,6 +267,7 @@ class ShardedClusterFixture( interface .Fixture): output = [] for shard in self.shards: output += shard.get_node_info() + raise AttributeError( "Intentionally raised" ) for mongos in self.mongos: output += mongos.get_node_info() return output + self.configsvr.get_node_info()
    • 159

      Some of the commits which were impacted by BF-27442 had a large number of setup failures (for example).

      [2023/01/11 04:17:48.924] [executor:js_test:job0] 04:17:48.919Z The setup of ShardedClusterFixture (Job #0) failed.
      [2023/01/11 04:17:48.932] [executor:js_test:job0] 04:17:48.928Z Encountered an error when tearing down the fixture.
      [2023/01/11 04:17:48.932] Traceback (most recent call last):
      [2023/01/11 04:17:48.932]   File "/data/mci/4d10ab542ccbbb72ec0cfdae37347ba8/src/buildscripts/resmokelib/testing/job.py", line 95, in __call__
      [2023/01/11 04:17:48.932]     teardown_succeeded = self.manager.teardown_fixture(self.logger)
      [2023/01/11 04:17:48.932]   File "/data/mci/4d10ab542ccbbb72ec0cfdae37347ba8/src/buildscripts/resmokelib/testing/job.py", line 384, in teardown_fixture
      [2023/01/11 04:17:48.932]     self.report.logging_prefix = create_fixture_table(self.fixture)
      [2023/01/11 04:17:48.932]   File "/data/mci/4d10ab542ccbbb72ec0cfdae37347ba8/src/buildscripts/resmokelib/testing/fixtures/interface.py", line 360, in create_fixture_table
      [2023/01/11 04:17:48.932]     info: List[NodeInfo] = fixture.get_node_info()
      [2023/01/11 04:17:48.932]   File "/data/mci/4d10ab542ccbbb72ec0cfdae37347ba8/src/buildscripts/resmokelib/testing/fixtures/shardedcluster.py", line 271, in get_node_info
      [2023/01/11 04:17:48.932]     output += mongos.get_node_info()
      [2023/01/11 04:17:48.932]   File "/data/mci/4d10ab542ccbbb72ec0cfdae37347ba8/src/buildscripts/resmokelib/testing/fixtures/shardedcluster.py", line 573, in get_node_info
      [2023/01/11 04:17:48.932]     port=self.port, pid=self.mongos.pid)
      [2023/01/11 04:17:48.932] AttributeError: 'NoneType' object has no attribute 'pid'
      ...
      [2023/01/11 04:19:00.678] [resmoke] 04:19:00.678Z Failed to flush all logs within a reasonable amount of time, treating logs as incomplete
      [2023/01/11 04:19:00.678] [resmoke] 04:19:00.678Z Exiting with code 75 rather than requested code 2 because we failed to flush all log output to logkeeper.
      

      https://parsley.mongodb.com/evergreen/mongodb_mongo_master_enterprise_rhel_80_64_bit_dynamic_all_feature_flags_required_sharding_jscore_passthrough_3a842713b25c2945fe1884abd8e60203f37f6258_23_01_11_03_08_29/0/task?bookmarks=0,1522,1597,1598&selectedLine=1522

      Setup failures are intentionally ignored by the Build Barons so this can lead to delays in the timeliness of identifying true failures. (Setup failures are ignored because Logkeeper instability has been generally accepted and accommodated within the testing infrastructure, see SERVER-35472. The concept of setup failures may be worth revisiting now that Logkeeper has moved to S3 but I'm considering that outside the scope of this issue here.)

      It looks like the changes to standalone.py in 3805148 as part of SERVER-66045 made it so get_node_info() wouldn't raise an exception when the fixture setup had failed for mongod. However there is an equivalent case for when the fixture setup had failed for mongos and is why the setup failures observed here all happen with the ShardedClusterFixture being used.

      Note: The uncaught exception at fixture teardown also causes resmoke to leak processes upon exit. It may we worthwhile to revisit whether the calls to create_fixture_table() in job.py should have their own try/except block too.

            Assignee:
            tausif.rahman@mongodb.com Tausif Rahman (Inactive)
            Reporter:
            max.hirschhorn@mongodb.com Max Hirschhorn
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: