The timestamp_abort run on margay in http://build.wiredtiger.com:8080/job/wiredtiger-test-recovery-stress/10176/console is hung. The reason is that a child thread got an error:
+ ./test_timestamp_abort -m Parent: compatibility: false, in-mem log sync: true, timestamp in use: true Parent: Create 10 threads; sleep 25 seconds Create checkpoint thread Create timestamp thread Create 10 writer threads Thread 1 starts at 1844674407370955161 Thread 0 starts at 0 [1509305328:849637][33984:0x7f22bd7f2700], WT_SESSION.commit_transaction: commit timestamp 947 older than oldest timestamp: Invalid argument test_timestamp_abort: FAILED: thread_run/312: session->commit_transaction(session, tscfg): Invalid argument process aborting Thread 2 starts at 3689348814741910322 Thread 3 starts at 5534023222112865483 Thread 4 starts at 7378697629483820644 Thread 8 starts at 14757395258967641288 Thread 7 starts at 12912720851596686127 Thread 9 starts at 16602069666338596449 Thread 6 starts at 11068046444225730966 Thread 5 starts at 9223372036854775805
There is code in the parent that is supposed to detect that and not hang:
while (stat(statname, &sb) != 0 && kill(pid, 0) == 0) sleep(1);
The problem is that the child pid is apparently a zombie process and has terminated but not disappeared.
(gdb) p pid $1 = 33984 (gdb) detach Detaching from program: /mnt/data0/jenkins/workspace/wiredtiger-test-recovery-stress/build_posix/test/csuite/test_timestamp_abort, process 33981 (gdb) attach 33984 Attaching to program: /mnt/data0/jenkins/workspace/wiredtiger-test-recovery-stress/build_posix/test/csuite/test_timestamp_abort, process 33984 warning: process 33984 is a zombie - the process has already terminated ptrace: Operation not permitted. jenkins 33984 0.0 0.0 0 0 ? Z Oct29 0:00 [test_timestamp_] <defunct>