We noticed that having large pages in the metadata can make checkpoints complete very slowly. We also don't try to do update-restore eviction of metadata pages - which limits how often we'll be able to successfully evict metadata pages, and could lead to performance issues.
A wtperf workload that demonstrates the slow checkpoint behavior is:
$ cat bench/wtperf/runners/metadata-split-test.wtperf # Create a set of tables with uneven distribution of data conn_config="cache_size=1G,eviction=(threads_max=8),file_manager=(close_idle_time=100000),checkpoint=(wait=2000,log_size=2GB),statistics_log=(wait=1,json,on_close),session_max=1000" table_config="type=file,app_metadata=\"this_is_a_fairly_long_string_to_cause_splits_in_metadata_more_often_abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyabcdefghijklmnopqrstuvwxyabcdefghijklmnopqrstuvwxyabcdefghijklmnopqrstuvwxyabcdefghijklmnopqrstuvwxyabcdefghijklmnopqrstuvwxyabcdefghijklmnopqrstuvwxyabcdefghijklmnopqrstuvwxyzzzzzzzz\"" table_count=2000 icount=0 random_range=1000000000 pareto=10 range_partition=true report_interval=5 run_ops=0 populate_threads=0 icount=0
It should be relatively straight forward to translate that into a Python test case, though the issue is that the checkpoint on close is taking a long time - and defining long time in a Python test is traditionally difficult to get robust in automated testing. We'd need to look for a different signal that there was a problem with the behavior.
- is depended on by
-
SERVER-41824 Collection creation becomes very slow and has extended stalls
- Closed
- is duplicated by
-
WT-4883 High volume writes to metadata can cause stalls
- Closed