Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-54805

Mongo become unresponsive, Spike in Connections and FD

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.6.17, 4.0.23
    • Component/s: MMAPv1
    • None
    • ALL
    • v5.0
    • Repl 2021-06-14, Repl 2021-06-28

      System Configuration

      Os Version: CentOS 8
      cat /etc/redhat-release
      CentOS Linux release 8.1.1911 (Core)
      Mongo Version : 3.6.17
      Storage Engine: mmapV1
      Storage Type: Data Path in tmpfs

      Server Configuration:
      RAM: 160 GB
      HD: 100GB

      free -g
      total used free shared buff/cache available
      Mem: 157 7 32 51 117 96
      Swap: 3 0 3

      Problem Description

       Recently we upgraded from 3.6.9 to 3.6.17 as 3.6.17 is the release supports CentOS8.     After this migration we are experiencing frequent issues where in after 1-2 days of system run, the mongo members are not responsive (MongoShell as well as Java API). So far we have found this is affected only on the secondary members. While debugging, we found that the lsof for hanging mongo process shows huge number around 32K +. The netstat doesnt provide the similar number but once it reaches around 35K it crashes. The RAM and CPU are not the bottleneck as we have enough free memory. We are also running the dbPath in tempfs.

      Based on the Production/Operation Checklist, ulimit also not a concern. But the vm.max_map_count we set as 65530 but, mongo recommends 128000. even if we increase the value, the replica member recovered immediately but the connection count and lsof is not reducing. We are just postponing the crash for may be another week. So we are not sure how this kernal parameter sould help.

       

      We have seen similar issues reported in the JIRA but due to unresponsiveness from submitter, the cases got closed. 

      https://jira.mongodb.org/browse/SERVER-46701

      https://jira.mongodb.org/browse/SERVER-40625

      mongo logs shows

        2021-02-24T22:50:02.603+0000 I -        [listener] pthread_create failed: Resource temporarily unavailable2021-02-24T22:50:02.603+0000 W EXECUTOR [conn480777] Terminating session due to error: InternalError: failed to create service entry worker thread2021-02-24T22:50:03.686+0000 I -        [listener] pthread_create failed: Resource temporarily unavailable2021-02-24T22:50:03.686+0000 W EXECUTOR [conn480778] Terminating session due to error: InternalError: failed to create service entry worker thread2021-02-24T22:50:03.690+0000 I -        [listener] pthread_create failed: Resource temporarily unavailable2021-02-24T22:50:03.690+0000 W EXECUTOR [conn480779] Terminating session due to error: InternalError: failed to create service entry worker thread2021-02-24T22:50:03.709+0000 I -        [listener] pthread_create failed: Resource temporarily unavailable

       

      Attached the diagnostcs.data , systemctl output, rs.status(),rs.conf() and mongo logs.

        1. mongo-jira.txt
          28 kB
        2. LOGS.tar.gz
          23.42 MB
        3. diagnostic_data.tar.gz
          57.90 MB
        4. mongo-9vb-27767.png
          mongo-9vb-27767.png
          177 kB
        5. mongo-GDB-10va-27737.png
          mongo-GDB-10va-27737.png
          170 kB

            Assignee:
            lingzhi.deng@mongodb.com Lingzhi Deng
            Reporter:
            veramasu@hcl.com venkataramans rama
            Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

              Created:
              Updated:
              Resolved: