Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-2176

Raw compression can create unreasonably large pages

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • WT2.8.0
    • Affects Version/s: None
    • Component/s: None
    • None

      This ticket has evolved. It looks like the biggest problem is that using zlib compression we are creation a 2.5MB page on disk (when uncompressed) - the cache size is only 1MB. So a page swap with that page causes eviction to stall.

      Old analysis follows:

      There is a test/format workload that has hung due to a full cache. The configuration has a single worker thread and a 1MB cache.

      The only application thread is:

      #2  0x000000000043196e in __wt_cond_wait (session=0x2b778b0, cond=0x2b75b50,
          usecs=100000) at ../src/include/misc.i:18
      #3  0x0000000000435d08 in __wt_cache_eviction_worker (session=0x2b778b0,
          busy=false, pct_full=328) at ../src/evict/evict_lru.c:1544
      #4  0x00000000004a5802 in __wt_cache_eviction_check (session=0x2b778b0,
          busy=false, didworkp=0x0) at ../src/include/cache.i:236
      #5  0x00000000004a5f54 in __wt_txn_begin (session=0x2b778b0, cfg=0x0)
          at ../src/include/txn.i:266
      #6  0x00000000004a5fd4 in __wt_txn_autocommit_check (session=0x2b778b0)
          at ../src/include/txn.i:287
      #7  0x00000000004a842f in __wt_page_in_func (session=0x2b778b0, ref=0x345a5c0,
          flags=0, file=0x6c2f45 "../src/btree/col_srch.c", line=93)
          at ../src/btree/bt_read.c:575
      #8  0x00000000004c237d in __wt_page_swap_func (session=0x2b778b0,
          held=0x2b75ec0, want=0x345a5c0, flags=0,
          file=0x6c2f45 "../src/btree/col_srch.c", line=93)
          at ../src/include/btree.i:1260
      #9  0x00000000004c2baa in __wt_col_search (session=0x2b778b0, recno=87641,
          leaf=0x0, cbt=0x7f23b0032d40) at ../src/btree/col_srch.c:93
      #10 0x0000000000516f4d in __cursor_col_search (session=0x2b778b0,
          cbt=0x7f23b0032d40, leaf=0x0) at ../src/btree/bt_cursor.c:226
      #11 0x0000000000518821 in __wt_btcur_remove (cbt=0x7f23b0032d40)
          at ../src/btree/bt_cursor.c:670
      #12 0x00000000004da976 in __curfile_remove (cursor=0x7f23b0032d40)
          at ../src/cursor/cur_file.c:331
      ---Type <return> to continue, or q <return> to quit---
      #13 0x00000000004131fa in col_remove (cursor=0x7f23b0032d40,
          key=0x7f23c891cdf0, keyno=87641, notfoundp=0x7f23c891cda4)
          at ../../../test/format/ops.c:1183
      #14 0x00000000004115fa in ops (arg=0x255cd50) at ../../../test/format/ops.c:426
      

      i.e: It is an auto-commit transaction that is doing a cache full check before allocating an ID.

      There are 5 pages in cache, 848 bytes of them on internal pages. Two pages belong to the file:wt. One is a small internal page, the other is a 2.5MB leaf page.

      An oddity is that the session that is in __wt_txn_begin already has a snapshot allocated:

      (gdb) p session->txn
      $30 = {id = 0, isolation = WT_ISO_SNAPSHOT, snap_min = 4, snap_max = 4,
        snapshot = 0x7f23b0032960, snapshot_count = 0, txn_logsync = 0, mod = 0x0,
        mod_alloc = 0, mod_count = 0, logrec = 0x0, notify = 0x0, ckpt_lsn = {
          file = 0, offset = 0}, full_ckpt = false, ckpt_nsnapshot = 0,
        ckpt_snapshot = 0x0, flags = 8}
      

      Which is keeping the system wide snap_min pinned to 4:

      (gdb) p $3->txn_global
      $22 = {current = 4, last_running = 4, oldest_id = 4, scan_count = 0,
        checkpoint_id = 0, checkpoint_gen = 0, checkpoint_pinned = 0,
        nsnap_rwlock = 0x2b731b0, nsnap_oldest_id = 0, nsnaph = {tqh_first = 0x0,
          tqh_last = 0x2b6b478}, states = 0x2b94d80}
      

            Assignee:
            backlog-server-execution [DO NOT USE] Backlog - Storage Execution Team
            Reporter:
            alexander.gorrod@mongodb.com Alexander Gorrod
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: