Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-6175

tcmalloc fragmentation is worse in 4.4 with durable history

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • WT10.0.0, 4.4.0-rc10, 4.7.0
    • Affects Version/s: 4.4.0-rc4
    • Component/s: None

      Issue Status as of Nov 2, 2020

      ISSUE DESCRIPTION AND IMPACT

      The accumulation of many small data structures (typically associated with inserts and updates) in the WiredTiger cache can cause the system's memory allocator to use more space than is requested by WiredTiger. Historically, the main mechanism for addressing the impact of fragmentation has been to limit the amount of dirty data that can accumulate in the cache to 20%. The precise limit can be controlled using the eviction_dirty_trigger configuration option..

      However, some WiredTiger cache pages with many associated small memory allocations can remain in cache after a checkpoint and be marked as clean pages. The clean/dirty distinction helps limit the amount of work done in checkpoints, but is in this way an estimate of memory allocator fragmentation.

      With the introduction of durable history in MongoDB 4.4, it is more common that small memory allocations associated with these small objects are contributing more to fragmentation than in previous versions.

      To address this, we are now:

      • Tracking insert and update data structures as a separate attribute of cache usage.
      • Extending the cache eviction process to manage the proportion of cache associated with small allocations, similarly to how it manages clean and dirty content.
      • Adding a configurable trigger (eviction_updates_trigger) on the amount of small objects in the cache, to prompt eviction of that content. The default value is eviction_dirty_trigger / 2 (10%).
      • Adding a configurable target (eviction_updates_target) to serve as a goal for the eviction process. The default value is eviction_dirty_target / 2 (2.5%).

      DIAGNOSIS AND AFFECTED VERSIONS

      This change is introduced in WT3.2.2, MongoDB 4.4+.

      A deployment running with the default configuration and servicing workloads that generate a large number of small objects may be governed more by the new dirty triggers than the generic dirty triggers. If this occurs you will notice that cache dirty % tends more toward the eviction_updates_target of 2.5% rather than the eviction_dirty_target of 5%.

      REMEDIATION AND WORKAROUNDS

      These changes in eviction behavior are expected and should be evaluated in the context of how clients of the MongoDB server are affected, if at all.

      original description

      This isn't new with 4.4.0-rc4, it has been an issue in all of the 4.4 release candidates I tried. HELP-13660 has a possible explanation for the trigger: 1) modify many documents and then 2) do queries that require long-running scans.

      My test case is Linkbench with a large database. The workload is 1) load the database 2) create a secondary index on one of the collections and 3) run transactions. The problem happens at step 2 which does a scan during create index. The test database is ~200G with Snappy compression and WiredTiger has cacheSizeGB=40.

      I dump tcmalloc stats after each step. Much more detail is here and the summary is listed below.

      For 4.4.0-rc4, VSZ for the mongod process is ~9G larger after create index compared to VSZ for 4.2.6 or 4.4 prior to the durable history merge.

      This can be reproduced with Linkbench2 that is in DSI, although:
      1) that will have to be changed to create the secondary index after the load.
      2) I use maxid1=200M while the code in DSI now uses maxid1=10M

      I am not sure whether Henrik added a repro to DSI for this when he did the work leading to HELP-13660

        1. 3stacks.png
          155 kB
          Bruce Lucas
        2. comparison.png
          177 kB
          Bruce Lucas
        3. fragmentation.png
          156 kB
          Bruce Lucas
        4. growth.png
          245 kB
          Bruce Lucas
        5. hpe.426.tar
          44.26 MB
          Mark Callaghan
        6. linkbench-10G.png
          529 kB
          Michael Cahill
        7. metrics.2020-05-08T14-09-24Z-00000.r1
          9.93 MB
          Mark Callaghan
        8. metrics.2020-05-08T20-17-36Z-00000.r1
          10.00 MB
          Mark Callaghan
        9. metrics.2020-05-09T00-53-05Z-00000.r1
          621 kB
          Mark Callaghan
        10. metrics.interim
          190 kB
          Mark Callaghan
        11. metrics.interim.r1
          22 kB
          Mark Callaghan
        12. repro-32-5G.png
          367 kB
          Michael Cahill
        13. wt6175.lb200m.may14.tar
          48.80 MB
          Mark Callaghan

            Assignee:
            michael.cahill@mongodb.com Michael Cahill (Inactive)
            Reporter:
            mark.callaghan@mongodb.com Mark Callaghan (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            29 Start watching this issue

              Created:
              Updated:
              Resolved: