Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-3745

Don't stall reads due to write pressure

    • Type: Icon: Improvement Improvement
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 3.6.0-rc5, WT3.0.0
    • Affects Version/s: None
    • Component/s: None
    • Storage 2017-12-04
    • v3.6

      WiredTiger has several thresholds when managing its cache. In particular, with default settings, all application operations are throttled when the amount of dirty content in cache reaches 20%.

      However, this behavior combines with MongoDB's replication machinery to create a vicious cycle where heavy update workloads generate a lot of cache pressure on the primary. Secondaries can only apply the oplog as fast as they can read it from the primary, so some replication lag is common during heavy write workloads.

      With readConcern majority always on in 3.6, replication lag generates further cache pressure on the primary as it maintains history for majority reads. This can in turn slow down secondary reads of the oplog when the primary is overwhelmed by updates.

      Further, once lookaside eviction is required, pages can be evicted from cache and read back with history, leaving them marked dirty. This further contributes to cache pressure on primaries (and particularly pressure increasing the dirty content in cache).

      Investigate only throttling update operations when the dirty cache limit is reached and allowing reads to proceed. Further, investigate situations that cause oplog reads to block and attempt to tweak behavior to favor oplog reads making progress.

        1. insert_ttl.js
          4 kB
          Susan LoVerso

            Assignee:
            michael.cahill@mongodb.com Michael Cahill (Inactive)
            Reporter:
            michael.cahill@mongodb.com Michael Cahill (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: