Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-447

Lock up when running with 16 threads using Btree

    • Type: Icon: Task Task
    • Resolution: Done
    • WT1.6.4
    • Affects Version/s: None
    • Component/s: None

      Hi all,

      Here is some more information on why we lock up or crawl along very slowly (the two are hardly distinguishable in this case) when we run the levelDB benchmark with Btree with 16 threads.

      To reproduce:

      env LD_LIBRARY_PATH=../dbg-wt-dev-branch/build_posix/.libs:../dbg-wt-dev-branch/build_posix/ext/compressors/snappy/.libs/ TEST_TMPDIR="" ./db_bench_wiredtiger --cache_size=134217728 --use_lsm=0 --threads=16 --db=/tmpfs/leveldb --benchmarks=fillseq

      In my experience, the benchmark will run ok until it hits roughly 600,000 ops, and then it begins crawling.

      An exercise with GDB revealed that the worker threads spend most of their time in *wt_page_in_func() in /src/btree/bt_page.c. From what I understand, most of the time the thread is there, it can't find the page and so it goes to yield *wt_yield() (at line 100 of that file).

      The reason why I think this is what's happening is because when I enabled breakpoints at *wt_page_in_func() as well as at *wt_yield() and then counted their occurrence, the counts were roughly similar as shown here:

      Num Type Disp Enb Address What
      1 breakpoint keep y 0x00007ffff795fbfd in __wt_page_in_func at ../src/btree /bt_page.c:67
      breakpoint already hit 21953 times
      ignore next 978088 hits
      2 breakpoint keep y 0x00007ffff79a1450 in __wt_yield at ../src/os_posix/os_yield.c:17
      breakpoint already hit 21920 times

      So this could be a similar issue to what we saw last week: when we don't have enough space in the cache, eviction can't keep up and we are crawling. The performance charts actually support this hypothesis. If you compare performance results for <a href="http://www2.cs.sfu.ca/~fedorova/temp/WT-JAN-25/big/big.html">Big</a> and <a href="http://www2.cs.sfu.ca/~fedorova/temp/WT-JAN-25/big512/big512.html">Big512</a>, which is the same benchmark with 128MB and 512MB cache respectively, you will see that Btree locks up (or crawls slowly) in the 128MB config, but NOT in the 512MB config.

      Hope this helps. Please let me know if I can provide more information. If you don't have access to a machine with 16 cores and cannot reproduce this on a smaller machine, please let me know and I'll give you access to a server in my lab.

            Assignee:
            sue.loverso@mongodb.com Susan LoVerso
            Reporter:
            fedorova Alexandra (Sasha) Fedorova
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved: