Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-26923

OOM Killer Terminates All 3 Nodes in a Shard Using WiredTiger

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Critical - P2 Critical - P2
    • None
    • Affects Version/s: 3.0.11, 3.2.10
    • Component/s: Text Search
    • None
    • ALL
    • Hide

      1. Configure shard to use wiredTiger.
      2. Wait an indefinite period of time while automated tests are running (6-12 hours).
      3. Identify the oom-killer and resulting crash in the server's logs.

      Show
      1. Configure shard to use wiredTiger. 2. Wait an indefinite period of time while automated tests are running (6-12 hours). 3. Identify the oom-killer and resulting crash in the server's logs.

      Hello,

      We've recently upgraded our MongoDB deployment to 3.2.10. During this upgrade we intended to migrate the storage engine to the new wiredTiger but ran into stability issues. Seemingly randomly throughout the day all data bearing nodes would crash due to oom-killer termination.

      There are many memory leak issues with wiredTiger in JIRA, most of them fixed. The one we hoped would be beneficial was fixed in 3.2.10 (WT-2796), but alas, we still ran into the same problem while running automated tests (not particularly stressful ones) on our cluster.

      We have a multiple environment deployment and the problem presented itself in all lower environments, causing us to reverse the decision to migrate to wiredTiger until we find a way to stabilize it.

      We are using the 1.11 C# driver in our application. The higher environments are both 5 sharded clusters with 3 data bearing nodes in each shard's replica set. The config servers are not configured as a replica set and will not be migrated to wiredTiger at this point. Our application is cloud hosted in AWS and the number of servers running mongos.exe locally scales up and down automatically according to the load on service queues.

      Please let me know if I can provide any further information. This bug is marked as critical because it involves a severe memory leak, per the table of priorities.

      Thanks,
      Shy

            Assignee:
            kelsey.schubert@mongodb.com Kelsey Schubert
            Reporter:
            shy@tegrity.com Shy Tamir
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: