-
Type: Bug
-
Resolution: Fixed
-
Priority: Critical - P2
-
Affects Version/s: 5.0.0, 6.0.0, 6.2.0-rc6
-
Component/s: Testing Infrastructure
-
None
-
Server Development Platform
-
Fully Compatible
-
ALL
-
-
159
Some of the commits which were impacted by BF-27442 had a large number of setup failures (for example).
[2023/01/11 04:17:48.924] [executor:js_test:job0] 04:17:48.919Z The setup of ShardedClusterFixture (Job #0) failed. [2023/01/11 04:17:48.932] [executor:js_test:job0] 04:17:48.928Z Encountered an error when tearing down the fixture. [2023/01/11 04:17:48.932] Traceback (most recent call last): [2023/01/11 04:17:48.932] File "/data/mci/4d10ab542ccbbb72ec0cfdae37347ba8/src/buildscripts/resmokelib/testing/job.py", line 95, in __call__ [2023/01/11 04:17:48.932] teardown_succeeded = self.manager.teardown_fixture(self.logger) [2023/01/11 04:17:48.932] File "/data/mci/4d10ab542ccbbb72ec0cfdae37347ba8/src/buildscripts/resmokelib/testing/job.py", line 384, in teardown_fixture [2023/01/11 04:17:48.932] self.report.logging_prefix = create_fixture_table(self.fixture) [2023/01/11 04:17:48.932] File "/data/mci/4d10ab542ccbbb72ec0cfdae37347ba8/src/buildscripts/resmokelib/testing/fixtures/interface.py", line 360, in create_fixture_table [2023/01/11 04:17:48.932] info: List[NodeInfo] = fixture.get_node_info() [2023/01/11 04:17:48.932] File "/data/mci/4d10ab542ccbbb72ec0cfdae37347ba8/src/buildscripts/resmokelib/testing/fixtures/shardedcluster.py", line 271, in get_node_info [2023/01/11 04:17:48.932] output += mongos.get_node_info() [2023/01/11 04:17:48.932] File "/data/mci/4d10ab542ccbbb72ec0cfdae37347ba8/src/buildscripts/resmokelib/testing/fixtures/shardedcluster.py", line 573, in get_node_info [2023/01/11 04:17:48.932] port=self.port, pid=self.mongos.pid) [2023/01/11 04:17:48.932] AttributeError: 'NoneType' object has no attribute 'pid' ... [2023/01/11 04:19:00.678] [resmoke] 04:19:00.678Z Failed to flush all logs within a reasonable amount of time, treating logs as incomplete [2023/01/11 04:19:00.678] [resmoke] 04:19:00.678Z Exiting with code 75 rather than requested code 2 because we failed to flush all log output to logkeeper.
Setup failures are intentionally ignored by the Build Barons so this can lead to delays in the timeliness of identifying true failures. (Setup failures are ignored because Logkeeper instability has been generally accepted and accommodated within the testing infrastructure, see SERVER-35472. The concept of setup failures may be worth revisiting now that Logkeeper has moved to S3 but I'm considering that outside the scope of this issue here.)
It looks like the changes to standalone.py in 3805148 as part of SERVER-66045 made it so get_node_info() wouldn't raise an exception when the fixture setup had failed for mongod. However there is an equivalent case for when the fixture setup had failed for mongos and is why the setup failures observed here all happen with the ShardedClusterFixture being used.
Note: The uncaught exception at fixture teardown also causes resmoke to leak processes upon exit. It may we worthwhile to revisit whether the calls to create_fixture_table() in job.py should have their own try/except block too.
- is related to
-
SERVER-64151 resmoke.py fails when running with --repeat > 1
- Closed
-
SERVER-55548 resmoke.py reports stale "Fixture status" message during fixture teardown
- Closed
-
SERVER-50085 Make it easier to correlate mongo process names, ports, PIDs in logs of fixtures started by resmoke
- Closed
-
SERVER-66045 Run an unbounded number of splits during passthrough
- Closed