-
Type: Improvement
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Btree
-
Storage Engines
-
8
-
2024-03-19 - PacificOcean, 2024-04-02 - GreatMugshot, 나비 (nabi) - 2024-04-16, Nick - 2024-04-30, Megabat - 2024-05-14, 2024-05-28 - FOLLOW ON SPRINT
-
v8.0
__wt_update_obsolete_check() iterates over an update chain checking for obsolete updates. On workloads that with long update chains, we have observed a noticeable amount of time spent on these list traversals.
This ticket proposes several lines of optimization.
Here is the core loop:
367 for (first = NULL, count = 0; upd != NULL; upd = upd->next, count++) { 368 if (upd->txnid == WT_TXN_ABORTED) 369 continue; 370 371 /* 372 * WiredTiger internal operations such as Rollback to stable and prepare transaction 373 * rollback adds a globally visible tombstone to the update chain to remove the entire key. 374 * Treating these globally visible tombstones as obsolete and trimming update list can cause 375 * problems if the update chain is getting accessed somewhere. To avoid this problem, skip 376 * these globally visible tombstones from the update obsolete check were generated from 377 * prepare transaction rollback and not from RTS, because there are no concurrent operations 378 * run in parallel to the RTS to be affected. 379 */ 380 if (upd->txnid == WT_TXN_NONE && upd->start_ts == WT_TS_NONE && 381 upd->type == WT_UPDATE_TOMBSTONE && upd->next != NULL && 382 upd->next->txnid == WT_TXN_ABORTED && upd->next->prepare_state == WT_PREPARE_INPROGRESS) 383 continue; 384 385 if (__wt_txn_upd_visible_all(session, upd)) { 386 if (first == NULL && WT_UPDATE_DATA_VALUE(upd)) 387 first = upd; 388 } else 389 first = NULL; 390 391 /* Cannot truncate the updates if we need to remove the updates from the history store. */ 392 if (F_ISSET(upd, WT_UPDATE_TO_DELETE_FROM_HS)) 393 first = NULL; 394 }
There are a few possible optimizations here.
- Linked list traversal is slow because it has poor locality – each link accesses a different, and unpredictable piece of memory. At the beginning of each iteration (i.e., before line #368), we could tell the CPU to prefetch the next list element, probably via __builtin_prefetch. It won't solve the problem, but will shave some cycles off the memory stall when we actually need that element. (h/t mathias@mongodb.com)
- We only take the second continue, at line #383, if upd->next->txnid == WT_TXN_ABORTED. That means that on the next iteration through the loop, the condition at line #368 will be true and we will continue again. So we could save a bit of time by advancing the loop variables at line #383 and skipping an iteration. (This is also a risky change, since it makes the code more brittle. So we probably don't want to do this unless we believe we're seeing a lot of tombstones on obsolete checks.)
- It appears that we call __wt_update_obsolete_check(), via __wt_update_serial(), on almost every insert. That means that on a frequently-update key, we could wind up making frequent traversals down a long update chain. Maybe we could apply a heuristic here and not check for obsolete elements on every insert?
- is duplicated by
-
WT-12362 Reduce time spent holding locks when updating a record
- Closed
- is related to
-
SERVER-88327 Update config.transactions less frequently for retryable writes
- Backlog
- related to
-
WT-12362 Reduce time spent holding locks when updating a record
- Closed