It's not currently possible to actually write out maximum-sized overflow key/value items.
The attempt will fail because we use WT_ITEM structures to move data objects through the system, and a maximum-sized overflow item will attempt to allocate a WT_ITEM structure for a data-length larger than a uint32_t can hold. The specific problem I'm seeing at the moment is we attempt to allocate a buffer to write the overflow object, and the allocated data size is:
WT_ALIGN(overflow-size + block-header-size + page-header-size, object-allocation-size)
In other words, a huge key/value item written into a table with a large allocation size will require a chunk of memory larger than the uint32_t WT_ITEM.size field.
@michaelcahill, @agorrod, @sueloverso: it occurred to me – maybe the right thing to do is change the WT_ITEM.size type to be a size_t (matching the WT_ITEM.memsize field).
I vaguely recall how we got here, but is there still a reason we want the data-length field to be a fixed 32-bit size?
It's an API change and means applications have no API-enforced limit on the size of objects they hand the engine. Also, it's not a pretty change, but I think it's more natural and the size of objects in memory should be a size_t, regardless of any underlying storage limits.
Anyway, if that's a bad idea, I'll probably create a new structure (declared at the block manager layer), that we use for I/O, and replace the block manager's use of a WT_ITEM.
- related to
-
WT-811 huge key/value support.
- Closed