-
Type: Task
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Storage Engines
-
StorEng - Defined Pipeline
When WiredTiger rolls back a write transaction, it does not free the space used by the updates it aborts. Instead these updates remain on their update chains (marked with a transaction ID of WT_TXN_ABORTED), until the update chain is cleaned up as a part of eviction, checkpoint, obsolete checking, etc.
We have seen pathological cases where this is a problem. A transaction aborts because it dirties too much data, the server retries, but at that point there is less space available because the remnants of the aborted transaction are still in the cache, and the process repeats. And since the transaction is probably dirtying a lot of records, this increases cache pressure, affecting other operations and triggering more work for eviction, etc.
In these cases (and maybe other times?) it would be beneficial if we could remove aborted updates from the cache at the time of transaction rollback.
As I understand it we can't remove aborted updates from their update chains immediately because of concurrent operations – we're operating lock-free here.
One path to freeing aborted updates might be to split WT_UPDATE into two pieces:
- The metadata about the update – timestamps, flags, etc. Everything in WT_UPDATE before WT_UPDATE.data.
- The actual data.
I.e., have WT_UPDATE point to the payload, rather than contain it. This would allow us to free the data storage during rollback while leaving a smaller record of the aborted operation in the update chain.
An obvious drawback here is doubling the number of memory allocator calls for each operation. It would also be an extensive code change (although not too complex). So I don't think this change would be worthwhile unless we find evidence of substantive benefit outside of a couple edge-case workloads.