Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-3707

timestamp_abort updating timestamp out of order and parent process not handling failure

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 3.6.0-rc3, WT3.0.0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Storage 2017-11-13

      The timestamp_abort run on margay in http://build.wiredtiger.com:8080/job/wiredtiger-test-recovery-stress/10176/console is hung. The reason is that a child thread got an error:

      + ./test_timestamp_abort -m
      Parent: compatibility: false, in-mem log sync: true, timestamp in use: true
      Parent: Create 10 threads; sleep 25 seconds
      Create checkpoint thread
      Create timestamp thread
      Create 10 writer threads
      Thread 1 starts at 1844674407370955161
      Thread 0 starts at 0
      [1509305328:849637][33984:0x7f22bd7f2700], WT_SESSION.commit_transaction: commit timestamp 947 older than oldest timestamp: Invalid argument
      test_timestamp_abort: FAILED: thread_run/312: session->commit_transaction(session, tscfg): Invalid argument
      process aborting
      Thread 2 starts at 3689348814741910322
      Thread 3 starts at 5534023222112865483
      Thread 4 starts at 7378697629483820644
      Thread 8 starts at 14757395258967641288
      Thread 7 starts at 12912720851596686127
      Thread 9 starts at 16602069666338596449
      Thread 6 starts at 11068046444225730966
      Thread 5 starts at 9223372036854775805
      

      There is code in the parent that is supposed to detect that and not hang:

                      while (stat(statname, &sb) != 0 && kill(pid, 0) == 0)
                              sleep(1);
      

      The problem is that the child pid is apparently a zombie process and has terminated but not disappeared.

      (gdb) p pid
      $1 = 33984
      (gdb) detach
      Detaching from program: /mnt/data0/jenkins/workspace/wiredtiger-test-recovery-stress/build_posix/test/csuite/test_timestamp_abort, process 33981
      (gdb) attach 33984
      Attaching to program: /mnt/data0/jenkins/workspace/wiredtiger-test-recovery-stress/build_posix/test/csuite/test_timestamp_abort, process 33984
      warning: process 33984 is a zombie - the process has already terminated
      ptrace: Operation not permitted.
      
      jenkins  33984  0.0  0.0      0     0 ?        Z    Oct29   0:00 [test_timestamp_] <defunct>
      

            Assignee:
            sue.loverso@mongodb.com Susan LoVerso
            Reporter:
            sue.loverso@mongodb.com Susan LoVerso
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: