We uncovered a memory leak from reconciliation, with the following signature:
==6758==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 9374 byte(s) in 7 object(s) allocated from:
#0 0x4984bd in malloc
#1 0x7ff1c5ae53e2 in __wt_malloc rc/os_common/os_alloc.c:91:14
#2 0x7ff1c5ae5f51 in __wt_memdup src/os_common/os_alloc.c:249:5
#3 0x7ff1c5b43229 in __rec_split_write src/reconcile/rec_write.c:2204:9
#4 0x7ff1c5b44962 in __wt_rec_split_finish src/reconcile/rec_write.c:1706:13
#5 0x7ff1c5b299de in __wt_rec_row_leaf src/reconcile/rec_row.c:1002:11
#6 0x7ff1c5b3fb73 in __reconcile src/reconcile/rec_write.c:258:9
#7 0x7ff1c5b3eeb2 in __wt_reconcile src/reconcile/rec_write.c:98:11
#8 0x7ff1c5a596f4 in __evict_review src/evict/evict_page.c:731:9
#9 0x7ff1c5a573e1 in __wt_evict src/evict/evict_page.c:168:5
#10 0x7ff1c5a47b9d in __evict_page src/evict/evict_lru.c:2334:5
#11 0x7ff1c5a4431a in __evict_lru_pages src/evict/evict_lru.c:1150:20
The memory leak is associated with a new failpoint that was added in WT-9252 and disabled in WT-9711. The failpoint is:
--- a/src/evict/evict_page.c +++ b/src/evict/evict_page.c @@ -760,10 +760,17 @@ __evict_review(WT_SESSION_IMPL *session, WT_REF *ref, uint32_t evict_flags, bool !__wt_page_is_modified(page) || LF_ISSET(WT_REC_HS | WT_REC_IN_MEMORY) || WT_IS_METADATA(btree->dhandle)); /* Fail 0.1% of the time. */ if (!closing && __wt_failpoint(session, WT_TIMING_STRESS_FAILPOINT_EVICTION_FAIL_AFTER_RECONCILIATION, 10)) return (EBUSY); return (0); }
The purview of this ticket is to:
- Understand and fix the root cause for the memory leak
- Restructure the code in evict_page.c to be more obvious, and introduced documented constraints if there are times when an eviction is not allowed to fail (hopefully there are none, but the cleanup is worthwhile regardless).
- Consider updating how memory is tracked across reconciliations to ensure cleanup is obviously correct.