-
Type: Task
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Not Applicable
-
None
Each checkpoint has a set of extent lists, which track the blocks used by the checkpoint, and blocks available/discarded in the file. To support GC, we'll need more information about other objects (and blocks within them) referred to, directly or indirectly, by the checkpoint.
One likely solution would be to create an additional extent list that travels with every checkpoint. That would have a set of triplets: objectid/offset/size to indicate which pieces are in use for past objects - so it would look different from the avail list, which has offset/size pairs. Would we need yet another extent list that has the avail or discard items for past objects? I don't think so, as we can never allocate more blocks from an older object. If we have the one new list, we'll have what we need for GC - the most recent object will always have everything we need. Moreover, it should be straightforward to keep this list updated in the block manager.
This ticket would seem to require a checkpoint format change (given the above approach), so we'll need to examine the ramifications for upgrade/downgrade compatibility. Although there may be some sly ways to emit the new data in a way that doesn't require a format change.
This ticket's work should include a way to debug/dump the new extant list so we can see it basically works, and the information is being recorded. This ticket does not need to do anything in the way of GC itself.
- related to
-
WT-8899 Optimize BLOCK_REUSE_BYTES metric querying
- Closed