Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-70568

Latency spikes without any additional load

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 4.4.10
    • Component/s: None
    • None
    • ALL

      We have a MongoDB 4.4.10 sharded cluster with 10 shards with PSS topology. From time to time (2 or 3 times a day) we encounter huge latency spike that can't be explained by queries our cluster serving currently. At that moment CPU load on problematic shard primary plummets to near zero. We observe no load increase on IO either. Everything that is currently working on this MongoDB instance starts to perform slowly. Such diagnostic tools as `atop` and `telegraf` also suffer from latency increase. `atop` loses a segment of time at the moment.

      We run our cluster on AWS EC2 instances but CloudWatch metrics regarding instance health are quiet. We recently upscaled instances in our cluster x2 from r5d.4xlarge to r5d.8xlarge but the problem still persists and its frequency hasn't lowered. We have another MongoDB clusters and other clusters are performing well.

      I attached an example FTDC log at the moment of latency spike. The exact time is 09:03-09:12 UTC, you can find it by connection count spike.

            Assignee:
            yuan.fang@mongodb.com Yuan Fang
            Reporter:
            sz Sergey Zagursky
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: