Michael, we're leaking the leaf page when we open a new tree and then close it without modifying it.
The problem is this is the only case where we have a clean page that has a child that requires a merge (it just happens to be a merge of an empty page), and we're not handling that case correctly during eviction.
There was a change you made a couple of months ago that I think is key here, and I wondered if you recalled what the change was about?
Basically, when we create an empty tree, we're marking the leaf page "empty" but "clean", and I think that's wrong.
I think that we should mark the leaf page dirty, but without contents, because then it gets reconciled like any other page, and during reconciliation, we figure out that it's empty and nothing needs to be written.
Obviously, we already have to handle the case where we create a tree, insert an item and then delete it, that shouldn't write any pages either, so, my inclination is that we mark the leaf page dirty but otherwise leave the page alone and make those two cases work the same.
We could modify eviction to catch it, but that seems like the wrong place to have a special case.
diff --git a/src/btree/bt_handle.c b/src/btree/bt_handle.c index 9228b48..f804ac1 100644 --- a/src/btree/bt_handle.c +++ b/src/btree/bt_handle.c @@ -252,6 +252,10 @@ __btree_tree_open_empty(WT_SESSION_IMPL *session) root = leaf = NULL; /* + * A note about empty trees: the initial tree is a root page and a leaf + * page, the leaf of which is marked dirty. If evicted without being + * modified, that's OK, nothing will ever be written. + * * Create a leaf page -- this can be reconciled while the root stays * pinned. */ @@ -272,10 +276,6 @@ __btree_tree_open_empty(WT_SESSION_IMPL *session) leaf->entries = 0; /* - * A note about empty trees: the initial tree is a root page and a leaf - * page, neither of which are marked dirty. If evicted without being - * modified, that's OK, nothing will ever be written. - * * Create the empty root page. * * !!! @@ -317,12 +317,11 @@ __btree_tree_open_empty(WT_SESSION_IMPL *session) btree->root_page = root; /* - * Mark the child page empty so that if it is evicted, the tree ends up - * sane. The page should not be dirty, else we would write empty trees - * on close, including empty checkpoints. + * Mark the leaf page dirty; if it's evicted, it will be reconciled, + * but if it's still empty, nothing will be written. */ WT_ERR(__wt_page_modify_init(session, leaf)); - F_SET(leaf->modify, WT_PM_REC_EMPTY); + __wt_page_modify_set(leaf); return (0);
This change seems to run, but I vaguely recall that the problem had to do with eviction under load (although that shouldn't be connected with empty pages I don't think).
Anyway, I know we've made repeated changes in this area, though, I'm wondering if you remember any of the details?
I won't push this change until I hear back from you.
- related to
-
WT-285 Change to identify read-only objects.
- Closed