-
Type: Improvement
-
Resolution: Gone away
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
1
-
Storage Engines - 2022-09-05
Our automated evergreen tests did not capture the sufficient information to easily identify the issue in WT-8349.
In that ticket, format failed because of an invalid value in its CONFIG file. Running the config manually using format.sh produces a useful error message and a core file:
$ ./format.sh -c CONFIG.49 format.sh: starting job in /data/mci/artifacts/test/format/RUNDIR.1 (Thu Nov 4 16:35:55 UTC 2021) format.sh: ./t -c /data/mci/artifacts/test/format/CONFIG.49 -h /data/mci/artifacts/test/format/RUNDIR.1 quiet=1 ./format.sh: line 584: 7887 Aborted (core dumped) nohup setsid $cmd > $log 2>&1 ... format.sh: job in /data/mci/artifacts/test/format/RUNDIR.1 failed t: process 7887 running t: FAILED: t: cache=1008483: value outside min/max values of 1-102400: Invalid argument t: run FAILED t: process aborting WiredTiger Error: aborting WiredTiger library format.sh: /data/mci/artifacts/test/format/RUNDIR.1 does not exist, format.sh unable to continue ./format.sh: line 358: 7897 Killed nohup setsid $cmd > $log 2>&1 format.sh: 0 successful jobs, 1 failed jobs
In the above, the output of the format log file includes the cause of the problem:
t: FAILED: t: cache=1008483: value outside min/max values of 1-102400: Invalid argument
The logs from the Evergreen task where this originally occurred are less helpful:
format.sh: job in /data/mci/67ee547007fad22ac2e27af848423d7c/wiredtiger/test/format/RUNDIR.49 exited with status 127 for an unknown reason format.sh: reporting job in /data/mci/67ee547007fad22ac2e27af848423d7c/wiredtiger/test/format/RUNDIR.49 as a failure format.sh: job in /data/mci/67ee547007fad22ac2e27af848423d7c/wiredtiger/test/format/RUNDIR.49 failed t: process 45945 running Killed
The above doesn't provide any information about why the failure happened. The artifacts from this run also don't include a core file. So I had to run the CONFIG to identify the problem. This wasn't hard, but it would have saved time if the useful error message had been captured and presented in the evergreen log.
I haven't tried to identify the exact issue, but the difference between the Evergreen run and my local run appears to be what was written to the RUNDIR.xx.log file and then dumped by format.sh.
Note that my manual run was on a spawn host machine the evergreen test system.