We should add a new debug_mode=[slow_checkpoint] configuration option to wiredtiger_open that causes checkpoints to be created more slowly. We could do that by adding in a small sleep each time checkpoint visits an internal page:
--- a/src/btree/bt_sync.c +++ b/src/btree/bt_sync.c @@ -300,6 +300,8 @@ __wt_sync_file(WT_SESSION_IMPL *session, WT_CACHE_OP syncop) if (WT_PAGE_IS_INTERNAL(page)) { internal_bytes += page->memory_footprint; ++internal_pages; + /* Slow down checkpoints */ + __wt_sleep(0, 10000); } else { leaf_bytes += page->memory_footprint; ++leaf_pages;
It would be interesting to do this while running a workload with an expected level of stable throughput, and observing the consequences of having a checkpoint run for a long time on that workload. An example of a wtperf workload that would be interesting is:
conn_config="cache_size=1GB,session_max=1000,eviction=(threads_min=8,threads_max=8),log=(enabled=false),transaction_sync=(enabled=false),checkpoint_sync=false,checkpoint=(wait=10)" table_config="allocation_size=1024,memory_page_max=30MB,prefix_compression=false,split_pct=90,leaf_page_max=32k,internal_page_max=1024,type=file" # About 2.5 GB of data - more than fits in cache. icount=25000000 table_count=3 log_like_table=true report_interval=5 run_time=120 pareto=10 populate_threads=1 threads=((count=2,updates=1,throttle=5000),(count=4,reads=1),(count=1,reads=1,read_range=100000,throttle=1)) # Add throughput/latency monitoring max_latency=2000 sample_interval=5
- is depended on by
-
WT-5332 Investigate the impact of slow checkpoints using the new debug mode
- Backlog