Loading...

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Affects Version/s: WT11.0.0
Component/s: Btree, Cache and Eviction
Labels:
Environment:
Run on Ubuntu 20, Linux kernel version 5.11. Database file is stored on an Samsung 970 pro 1TB SSD.

Assigned Teams:

Storage Engines
Sprint:
c(3x10^8)-StorEng - 2023-11-14, 2023-11-28 - Anthill Tiger, 2023-12-12 - Heisenbug, 2024-01-09 - I Grew Tired, StorEng - 2024-01-23, 2024-02-06 tapioooooooooooooca, 2024-02-20_A_near-death_puffin, 2024-03-05 - Claronald, 2024-03-19 - PacificOcean
Story Points:
8

Summary
The eviction server thread, when exiting, will clear btree->evict_ref, which records the current point in traversing this btree, by calling __evict_clear_walk() when finishing this eviction run. This means the next background eviction thread would traverse the btree from the beginning, and if it finishes before scanning through the whole tree, the next one would again start from the beginning and parts of the tree would never be scanned for eviction.

This means the current eviction mechanism cannot adapt to certain access patterns in certain (but common) conditions and fail to decide hotness of data correctly and evict hot data from cache. Workloads focusing on former parts of the tree would be much slower than those focusing on latter parts because former parts are always scanned and evicted from cache while latter parts are persistent in cache.

Reproduction and Consequence

This issue occur in the following setup:

I write an application that uses WiredTiger directly.
The database is using a single B-tree, sequentially filling 10,000,000 key value pairs of (16 + 72) bytes resulting in a ~1G uncompressed database size. With compression enabled (snappy compression), the on-disk file size is ~500M.
Total available memory for the application is 1G, and the cache size of WiredTiger is set to 820000KB (but actually only ~700M is used by the application; this may be another issue.) The workload runs 2,000,000 get requests in a single thread, either uniformly random, or focusing (>80% of all requests) on the first half of keys, or focusing on the second half. The total latency differs a lot (see the table below). The reason is described as above, that the first half pages in the tree are always chosen for eviction regardless of the workload.
Most parameters including those related to eviction, except for the cache size, are default.

I think any read workload with a single large btree and mild cache pressure (so the eviction thread finishes working before scanning through the whole tree) and penalizes cache miss heavily (e.g. by decompression cost) would reproduce similar results.

I have modified the code and verified that only the first half pages in the tree are frequently evicted from the cache regardless of the access pattern.

This behavior is checked to be true on WiredTiger version 10.0.0 (the current master branch) and 10.0.2 (the current develop branch)

Workload	Focusing on first half	Uniform Random	Focusing on second half
Latency	47s	36s	14s

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

1st.png
May 25 2022 01:39:57 PM UTC
857 kB
Sulabh Mahajan
2nd.png
May 25 2022 01:40:03 PM UTC
662 kB
Sulabh Mahajan
cxxopts.hpp
May 02 2022 01:24:59 AM UTC
44 kB
Yifan Dai
image-2023-11-30-07-07-52-326.png
Nov 29 2023 08:07:53 PM UTC
36 kB
Yury Ershov
random-descent-no-memory-histogram.png
Feb 27 2024 11:02:09 PM UTC
70 kB
Yury Ershov
random-descent-no-memory-timeline.png
Feb 27 2024 11:02:37 PM UTC
37 kB
Yury Ershov
random-descent-no-memory-timeline-zoom.png
Feb 27 2024 11:04:13 PM UTC
14 kB
Yury Ershov
random-descent-timeline.png
Feb 27 2024 10:44:40 PM UTC
26 kB
Yury Ershov
random-rescent-histogram.png
Feb 27 2024 10:43:43 PM UTC
72 kB
Yury Ershov
screenshot-1.png
Nov 29 2023 06:50:35 AM UTC
286 kB
Yury Ershov
screenshot-10.png
Mar 01 2024 05:46:29 AM UTC
27 kB
Yury Ershov
screenshot-11.png
Mar 01 2024 05:47:13 AM UTC
20 kB
Yury Ershov
screenshot-2.png
Nov 29 2023 08:44:47 AM UTC
864 kB
Yury Ershov
screenshot-3.png
Nov 29 2023 08:46:32 AM UTC
252 kB
Yury Ershov
screenshot-4.png
Nov 29 2023 07:57:43 PM UTC
137 kB
Yury Ershov
screenshot-5.png
Nov 29 2023 07:58:27 PM UTC
120 kB
Yury Ershov
screenshot-6.png
Mar 01 2024 03:36:53 AM UTC
19 kB
Yury Ershov
screenshot-7.png
Mar 01 2024 03:37:30 AM UTC
19 kB
Yury Ershov
screenshot-8.png
Mar 01 2024 03:38:30 AM UTC
37 kB
Yury Ershov
screenshot-9.png
Mar 01 2024 03:39:55 AM UTC
27 kB
Yury Ershov
screenshot-latency-50.png
Mar 11 2024 12:05:13 AM UTC
158 kB
Yury Ershov
screenshot-latency-95.png
Mar 11 2024 12:06:27 AM UTC
155 kB
Yury Ershov
screenshot-throughput.png
Mar 11 2024 12:07:23 AM UTC
150 kB
Yury Ershov
simple_read.cpp
May 02 2022 01:25:07 AM UTC
6 kB
Yifan Dai
WiredTigerStat.29.06
Dec 01 2023 03:08:14 AM UTC
4.87 MB
Yury Ershov
wt_trace_eviction-8.png
Apr 11 2022 07:16:52 AM UTC
61 kB
Yifan Dai

related to

WT-11300 Investigation: Failure to Queue Pages with Many Updates in Eviction Process

Closed

split to

WT-12643 Fix Eviction Server walk logic so that it's able to evict all pages

Closed

WT-12644 Fix handling of hazard pointers by Eviction Server

Closed

WT-12645 Improve visibility into Eviction Server's work regarding pages that it considers for eviction

Open

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates