-
Type: Task
-
Resolution: Done
-
None
-
Affects Version/s: None
-
Component/s: None
While running log-append.wtperf on Jenkins wtperf effectively hung in __wt_page_in_func. Although it was running the code from WT-1774, I do not believe the log code is involved in this issue.
The test is running on the Jenkins AWS box, a m2.4xlarge, equivalent to my AWS HDD. I ended up killing it after gathering some information.
A normal test run completes in about 169 seconds and the test.stat file looks like this:
1687998 populate inserts (9659745 of 50000000) in 5 secs (35 total secs) 1998511 populate inserts (11658256 of 50000000) in 5 secs (40 total secs) 1932159 populate inserts (13590415 of 50000000) in 5 secs (45 total secs) 1986784 populate inserts (15577199 of 50000000) in 5 secs (50 total secs)
This test run hit some kind of problem very quickly and never recovered. Its test.stat file looks like:
157894 populate inserts (157894 of 50000000) in 5 secs (5 total secs) 503 populate inserts (158397 of 50000000) in 5 secs (10 total secs) 504 populate inserts (158901 of 50000000) in 5 secs (15 total secs) 506 populate inserts (159407 of 50000000) in 5 secs (20 total secs) ... 500 populate inserts (222988 of 50000000) in 5 secs (655 total secs) 512 populate inserts (223500 of 50000000) in 5 secs (660 total secs) 504 populate inserts (224004 of 50000000) in 5 secs (665 total secs)
After more than 10 minutes it had only done 224k of 50M operations. We are getting 100 operations per second!
I ran pmp and it shows this every time. Our 8 populate threads are sleeping and nothing else is going on:
8 select,__wt_sleep,__wt_page_in_func,__wt_row_search,__wt_btcur_insert,__curfile_insert,populate_thread,start_thread,clone 2 1 select,__wt_sleep,__evict_server,start_thread,clone 1 pthread_cond_wait@@GLIBC_2.3.2,__wt_cond_wait,__ckpt_server,start_thread,clone 1 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__sweep_server,start_thread,clone 1 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__log_wrlsn_server,start_thread,clone 1 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__log_server,start_thread,clone 1 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__log_close_server,start_thread,clone
The AWS console charts show virtually no I/O going on during this test's run time. It does show significant I/O at the times of the other runs of this test.