Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-1522

test/format hung in eviction

    • Type: Icon: Task Task
    • Resolution: Cannot Reproduce
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None

      I noticed that wiredtiger-test-format had been running on tinderbox for 4 days. It is hung. The config:

      ############################################
      #  RUN PARAMETERS
      ############################################
      abort=0
      auto_throttle=1
      firstfit=0
      bitcnt=5
      bloom=1
      bloom_bit_count=17
      bloom_hash_count=16
      bloom_oldest=0
      cache=1
      checkpoints=1
      checksum=uncompressed
      chunk_size=8
      compaction=0
      compression=none
      data_extend=0
      data_source=file
      delete_pct=31
      dictionary=0
      evict_max=0
      file_type=row-store
      backups=1
      huffman_key=0
      huffman_value=0
      insert_pct=7
      internal_key_truncation=1
      internal_page_max=17
      isolation=read-committed
      key_gap=4
      key_max=122
      key_min=17
      leak_memory=0
      leaf_page_max=10
      logging=0
      logging_archive=1
      logging_prealloc=0
      logging=0
      lsm_worker_threads=3
      merge_max=6
      mmap=1
      ops=1000
      prefix_compression=1
      prefix_compression_min=0
      repeat_data_pct=64
      reverse=0
      rows=10000
      runs=50
      split_pct=80
      statistics=1
      threads=1
      timer=0
      value_max=1107
      value_min=18
      wiredtiger_config=
      write_pct=50
      ############################################
      

      Attaching to the process here are the stacks:

      (gdb) thread apply all bt
      Thread 3 (Thread 0x7f013687d700 (LWP 127010)):
      #0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
      WT-1  0x00000000004354af in __wt_cond_wait (session=session@entry=0x1cc7b40, cond=0x1ce2a10, usecs=usecs@entry=100000) at ../src/os_posix/os_mtx_cond.c:77
      WT-2  0x000000000042402e in __evict_server (arg=0x1cc7b40) at ../src/evict/evict_lru.c:195
      WT-3  0x0000003f17e07ee5 in start_thread (arg=0x7f013687d700) at pthread_create.c:309
      WT-4  0x0000003f176f4b8d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 2 (Thread 0x7f013607c700 (LWP 127011)):
      #0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
      WT-1  0x00000000004354af in __wt_cond_wait (session=session@entry=0x1cc7e00, cond=0x1cf9250, usecs=usecs@entry=10000000) at ../src/os_posix/os_mtx_cond.c:77
      WT-2  0x000000000041c744 in __sweep_server (arg=0x1cc7e00) at ../src/conn/conn_sweep.c:122
      WT-3  0x0000003f17e07ee5 in start_thread (arg=0x7f013607c700) at pthread_create.c:309
      WT-4  0x0000003f176f4b8d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 1 (Thread 0x7f0136a8a740 (LWP 94803)):
      #0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
      WT-1  0x00000000004354af in __wt_cond_wait (session=session@entry=0x1cc80c0, cond=0x1ce2a80, usecs=usecs@entry=100000) at ../src/os_posix/os_mtx_cond.c:77
      WT-2  0x00000000004c523c in __wt_cache_full_check (session=0x1cc80c0) at ../src/include/cache.i:166
      WT-3  __cursor_enter (session=0x1cc80c0) at ../src/include/cursor.i:62
      WT-4  __curfile_enter (cbt=<optimized out>, cbt=<optimized out>) at ../src/include/cursor.i:97
      WT-5  __cursor_func_init (reenter=1, cbt=0x24a2c50) at ../src/include/cursor.i:184
      WT-6  __wt_btcur_search (cbt=cbt@entry=0x24a2c50) at ../src/btree/bt_cursor.c:307
      WT-7  0x0000000000493533 in __curfile_search (cursor=0x24a2c50) at ../src/cursor/cur_file.c:166
      WT-8  0x00000000004127a4 in read_row (keyno=2, key=0x7fff0744e670, cursor=0x24a2c50) at ../../../test/format/ops.c:601
      WT-9  wts_read_scan () at ../../../test/format/ops.c:560
      WT-10 0x000000000040d8d7 in main (argc=<optimized out>, argv=<optimized out>) at ../../../test/format/t.c:187
      

      This is a tiny cache of 1Mb.

      (gdb) p pages_evicted
      $2 = 2166
      (gdb) p *cache
      $3 = {bytes_inmem = 6458665, pages_inmem = 2169, bytes_evict = 5202306, pages_evict = 2166, bytes_dirty = 1891, pages_dirty = 1, read_gen = 4450648, 
        evict_cond = 0x1ce2a10, evict_lock = {lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {
                __prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, counter = 0, name = 0x65e6a1 "cache eviction", id = 0 '\000', 
          initialized = 1 '\001'}, evict_walk_lock = {lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, 
              __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, counter = 0, name = 0x65e6b0 "cache walk", id = 0 '\000', 
          initialized = 1 '\001'}, evict_waiter_cond = 0x1ce2a80, eviction_trigger = 95, eviction_target = 80, eviction_dirty_target = 80, evict = 0x1ce2af0, 
        evict_current = 0x0, evict_candidates = 0, evict_entries = 0, evict_max = 0, evict_slots = 400, evict_file_next = 0x0, sync_request = 0, sync_complete = 0, 
        cp_saved_evict = 0, cp_current_evict = 0, cp_skip_count = 0, cp_reserved = 0, cp_session = 0x0, cp_tid = 0, flags = 16}
      (gdb) p conn->cache_size
      $5 = 1048576
      

      We have one, very large (larger than cache), internal page that says it is already on disk:

      (gdb) p *ref->page
      $32 = {u = {intl = {recno = 0, parent_ref = 0x1cfa0c0, __index = 0x24e6920}, row = {d = 0x0, ins = 0x1cfa0c0, upd = 0x24e6920, entries = 0}, col_fix = {
            recno = 0, bitf = 0x1cfa0c0 "\360:\373\001", entries = 38693152}, col_var = {recno = 0, d = 0x1cfa0c0, repeats = 0x24e6920, nrepeats = 0, entries = 0}}, 
        dsk = 0x1fb3c00, modify = 0x0, read_gen = 0, memory_footprint = 1254123, type = 6 '\006', flags_atomic = 2 '\002'}
      (gdb) p *ref
      $35 = {page = 0x0, home = 0x1fb3af0, ref_hint = 0, state = WT_REF_DISK, addr = 0x1fb3c2a, key = {recno = 4294967379, ikey = 0x100000053}, page_del = 0x0}
      

            Assignee:
            Unassigned Unassigned
            Reporter:
            sue.loverso@mongodb.com Susan LoVerso
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: