While investigating WT-4608, keith.bostic and I had a conversation that verify should not be preventing eviction from removing the pages from the cache.
Here's the order of operations in the verify:
- We are in a loop in bt_vrfy.c:__wt_verify() for each checkpoint in a file.
- First we call bt_handle.c:__wt_btree_open() which calls wt_evict_file_exclusive_on()
- Turning exclusive on sets btree->evict_disabled = true.
- Then the open function sets btree->evict_disabled_open = true.
- However, immediately after returning from the open, wt_verify calls wt_evict_file_exclusive_off.
- Then the bulk of the work is done in __verify_tree.
- Then wt_verify turns exclusive back on.
- And finally wt_verify calls wt_evict_file(DISCARD) to discard the file.
During a stuck clean cache failure, we see the cache at over 400% full. The verify is in the discard phase so eviction, at that point, cannot evict anything. So the questions are:
- Why is eviction not happening during the __verify_tree portion?
- Why is the verify thread itself not getting pulled into eviction during __verify_tree when it reads in pages? That thread should get pulled into work during wt_page_in_func (bt_read.c:675). I don't see a path where the verify thread is setting WT_SESSION_IGNORE_CACHE_SIZE.
- related to
-
WT-4608 Cache stuck with clean pages for LSM data format testing
- Closed