-
Type: Bug
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Performance
-
Storage Engines
-
(copied to CRM)
-
5
-
Megabat - 2024-05-14, 2024-05-28 - FOLLOW ON SPRINT, 2024-06-11 - Dinosaurs go rawr
Recently we have had a lot of help tickets about the latency spikes from customers upgrading from 4.2 to 4.4.
We believe the root cause is the following sequence:
- checkpoint starts and eviction on a table is blocked.
- more writes on the table happen and the pages continuously to grow.
- the pages have grown to a point that is much larger than the configured maximum page size.
- checkpoint finishes and forced eviction kicks in to evict these big pages. Because they are very big, it takes longer to evict them and the reads and writes on these pages are blocked for a longer time causing the spikes.
However, the same logic applies to 4.2 as well. There must be something in 4.4 that exacerbates this. e.g., Reconciliation now takes more time in 4.4 because of the history store, IO overhead of the time points we store to disk, or checkpoint cleanup overhead.
We need to understand what is really driving this vicious cycle.