Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Done
Fix Version/s: WT2.1
Affects Version/s: None
Component/s: None
Labels:
None

I'm working on getting wtperf to run in a riak-like configuration so that I can look at the issues we've been seeing there without a lot of layers. I created a tiny version of test1. Really, that means I added conn_config and table_config values that represent what we do in riak, and modeled the key and value size on basho_bench test1 (40 byte keys and 1000 byte values in this case).

Here's the wtperf config file. I modeled it as something that is 1% as many entries (5M instead of 500M), 10% populate threads (10 versus 100) and 25% the cache (5Gb instead of 21Gb). This config hangs on the AWS SSD box before completing, I'm guess when the cache fills up.

This was a tiny configuration as a sanity check in anticipation of running one that is a full test1 of 500M entries.

conn_config="cache_size=5G,checkpoint_sync=false,mmap=false,session_max=1024"
table_config="internal_page_max=128K,lsm=(bloom_config=(leaf_page_max=8MB),bloom_bit_count=28,bloom_hash_count=19,bloom_oldest=true,chunk_size=100MB,merge_threads=2),type=lsm"
icount=5000000
populate_threads=10
key_sz=40
value_sz=1000
report_interval=5

Here's the pmp output from the hang:

     10 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__wt_cache_full_check,__clsm_enter,__clsm_insert,populate_thread,start_thread,clone
      2 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__wt_lsm_merge_worker,start_thread,clone
      2 
      1 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__wt_cache_evict_server,start_thread,clone
      1 nanosleep,usleep,execute_populate,main
      1 __memcmp_sse4_1,__wt_ovfl_reuse_search,__rec_cell_build_ovfl,__rec_cell_build_val,__rec_row_leaf_insert,__rec_row_leaf,__wt_rec_write,__wt_sync_file,__wt_bt_cache_op,__wt_lsm_checkpoint_worker,start_thread,clone
      1 __lll_lock_wait,_L_lock_927,pthread_mutex_lock,__wt_spin_lock,__wt_conn_btree_sync_and_close,__wt_session_release_btree,__curbulk_close,__wt_bloom_finalize,__lsm_bloom_create,__lsm_bloom_work,__wt_lsm_merge_worker,start_thread,clone

The merge thread is waiting on the checkpoint lock, which presumably the checkpoint thread is holding. (With a smaller cache, this hangs much quicker in the same way.)

Assignee:: Susan LoVerso (Inactive)

Reporter:: Susan LoVerso (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: Jan 21 2014 09:00:43 PM UTC

Updated:: Apr 16 2015 07:16:13 PM UTC

Resolved:: Apr 09 2015 01:07:47 AM UTC

Details

Description

Attachments

Activity

People

Dates