-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: 4.4.0-rc4
-
Component/s: None
-
(copied to CRM)
ISSUE DESCRIPTION AND IMPACT
The accumulation of many small data structures (typically associated with inserts and updates) in the WiredTiger cache can cause the system's memory allocator to use more space than is requested by WiredTiger. Historically, the main mechanism for addressing the impact of fragmentation has been to limit the amount of dirty data that can accumulate in the cache to 20%. The precise limit can be controlled using the eviction_dirty_trigger configuration option..
However, some WiredTiger cache pages with many associated small memory allocations can remain in cache after a checkpoint and be marked as clean pages. The clean/dirty distinction helps limit the amount of work done in checkpoints, but is in this way an estimate of memory allocator fragmentation.
With the introduction of durable history in MongoDB 4.4, it is more common that small memory allocations associated with these small objects are contributing more to fragmentation than in previous versions.
To address this, we are now:
- Tracking insert and update data structures as a separate attribute of cache usage.
- Extending the cache eviction process to manage the proportion of cache associated with small allocations, similarly to how it manages clean and dirty content.
- Adding a configurable trigger (eviction_updates_trigger) on the amount of small objects in the cache, to prompt eviction of that content. The default value is eviction_dirty_trigger / 2 (10%).
- Adding a configurable target (eviction_updates_target) to serve as a goal for the eviction process. The default value is eviction_dirty_target / 2 (2.5%).
DIAGNOSIS AND AFFECTED VERSIONS
This change is introduced in WT3.2.2, MongoDB 4.4+.
A deployment running with the default configuration and servicing workloads that generate a large number of small objects may be governed more by the new dirty triggers than the generic dirty triggers. If this occurs you will notice that cache dirty % tends more toward the eviction_updates_target of 2.5% rather than the eviction_dirty_target of 5%.
REMEDIATION AND WORKAROUNDS
These changes in eviction behavior are expected and should be evaluated in the context of how clients of the MongoDB server are affected, if at all.
original description
This isn't new with 4.4.0-rc4, it has been an issue in all of the 4.4 release candidates I tried. HELP-13660 has a possible explanation for the trigger: 1) modify many documents and then 2) do queries that require long-running scans.
My test case is Linkbench with a large database. The workload is 1) load the database 2) create a secondary index on one of the collections and 3) run transactions. The problem happens at step 2 which does a scan during create index. The test database is ~200G with Snappy compression and WiredTiger has cacheSizeGB=40.
I dump tcmalloc stats after each step. Much more detail is here and the summary is listed below.
For 4.4.0-rc4, VSZ for the mongod process is ~9G larger after create index compared to VSZ for 4.2.6 or 4.4 prior to the durable history merge.
This can be reproduced with Linkbench2 that is in DSI, although:
1) that will have to be changed to create the secondary index after the load.
2) I use maxid1=200M while the code in DSI now uses maxid1=10M
I am not sure whether Henrik added a repro to DSI for this when he did the work leading to HELP-13660
- is related to
-
SERVER-48395 Extended stalls during heavy insert workload
- Closed
- related to
-
SERVER-20306 75% excess memory usage under WiredTiger during stress test
- Closed
-
WT-7190 Limit eviction of non-history store pages when checkpoint is operating on history store
- Closed
-
SERVER-48324 Expose parameter to include tcmalloc verbose statistics in ftdc
- Closed
-
WT-6203 Consider increasing default cache_overhead setting
- Closed
-
WT-8996 Review update eviction queuing logic
- Closed