WiredTiger has support to throttle I/O operations of various kinds relative to a configured capacity. Even though this is not currently used by MongoDB, I think it may be useful in the future (perhaps WT would dynamically throttle to prioritize certain activities under high stress). In any event, every read and write goes through the throttling system. However, we haven't put much thought into I/O happening in the chunk cache.
There are several potential points of I/O in the chunk cache. Upon initialization, the chunk cache backing file is truncated to a specific size using ftruncate. That's getting sorted out in WT-11535. Second, when a read request comes into a cold cache, we'll do a network call to read the chunk. Should we throttle the network request? Perhaps, but we'd probably want to have an independent throttle for that if we do. (and how easily is that supported by the current throttle code?). And by the way... when we've gotten to this point in the chunk cache, we've already called wt_capacity_throttle ! That is, in the block manager read wt_bm_read, we unconditionally call throttle and then call wt_block_read_off, which decides whether to use the block cache or do the read directly. So we probably don't want to do a regular "read" throttle until we know we're actually going to do a disk read, vs. a chunk cache "read" that often does no I/O.
After the network call completes, we put the value into the cache. That's a write to memory, but the memory is backed by an mmap file. Is it an implied write? Probably not, the OS should be lazy about writing back mmap data (possibly never writing, except maybe file close). But it might choose to write if memory is scarce. In theory, we should be configured so that we always have enough memory - and even if the OS does write, we wouldn't know when to make a throttle call. So I don't think there's anything we should do for this. When we close the mmaped file, though, we might call throttle.
Another case to consider is when we read from a cached value in memory. Again, this is an mmapped file, and again it shouldn't imply any I/O. There's an exception that is only theoretical at the moment. If we cached something previously and restarted, an item might already be in the mmapped file, but not in memory. I think our current chunk cache does not track what's cached across restarts, but if it does someday, we may need to throttle in this situation.
Note: This ticket assumes we want to continue to support capacity throttling. If we don't, or we want to use a different capacity model, we should reevaluate.
- related to
-
WT-11535 test_chunkcache01 failing on linux noftruncate
- Closed