-
Type: Improvement
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
8
A typical WT data file may have portions of the file that aren't being used. This may be due to a checkpoint being deleted that references some number of blocks uniquely. In a workload with a high number of deletes or truncates, there may be a larger proportion of such gaps.
Once we decide (via flush_tier) that a data file is finalized, and will become readonly, we have an opportunity when writing it to the cloud. We can look at the extent lists for any active checkpoints in the file, and any blocks that aren't there, we simply don't write. Each object written to the cloud would probably need a header to indicate which blocks are missing. The header would be used when the file is read to "reconstruct" the file on disk, or can be used with any sort of "disk file fragment" cache we have in operation. Any accesses to gaps "shouldn't happen".
Note that this optimization may be useful only when using tiered storage that isn't shared. When we start to share tiered storage, we will probably be using union tables, which will have the effect of creating tiered data files that are "tight", without any gaps. So it's essential to look at any expected gains in this light.