Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: WT11.3.0, 8.0.0-rc0, 7.3.0-rc2
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Storage Engines

There is a bug in the script that executes checkpoint-stress-test from our evergreen.yml.

Here's the relevant code:

        for i in $(seq ${times|1}); do
          for t in $(seq ${no_of_procs|1}); do
            eval nohup $CMD > nohup.out.$i.$t 2>&1 &
          done


          for t in $(seq ${no_of_procs|1}); do
            ret=0
            wait -n || ret=$?
            if [ $ret -ne 0 ]; then
              # Skip the below lines from nohup output file because they are very verbose and
              # print only the errors to evergreen log file.
              grep -v "Finished verifying" nohup.out.* | grep -v "Finished a checkpoint" | grep -v "thread starting"
            fi
            exit $ret. <<============
          done
        done

The test first loops to start multiple instances of the test. Then it loops waiting for the same number of instances to terminate. The problem is the exit command (indicated with <<========). It will be called on the first iteration through the second loop, regardless of whether the terminated task succeeded or failed.

So if we run ten concurrent tests, this script will exit after the first of the ten complete and therefore won't check for or report any errors from the other tests.

As simple fix would be to move the exit into the previous if ... fi block, so we only exit early if one of the tests fails. Better would be to note the failure but allow the other concurrent tests to finish so any other failures will be reported.

Assignee:: Keith Smith

Reporter:: Keith Smith

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: Jan 28 2024 08:17:27 PM UTC

Updated:: Mar 18 2024 05:48:00 PM UTC

Resolved:: Jan 29 2024 03:02:13 PM UTC

Details

Description

Attachments

Activity

People

Dates