-
Type: Bug
-
Resolution: Incomplete
-
Priority: Trivial - P5
-
None
-
Affects Version/s: 4.4.0, 4.4.1
-
Component/s: None
-
Environment:Community version. GCP VMs with ubuntu 18.04. 6x mongos, 3x configs, 45 (15 shards) mongod servers.
-
ALL
We did an upgrade to the 4.4.0 Aug 25th. Right after the upgrade was complete we noticed performance degradation across the entire system. Even queries by sharding key, which returns one document from the small collection (~30k documents), started to degrade in performance. For example, we saw cases where log entry on the mongos side says that the Slow query took 6-7 sec, but we have nothing on the mongod side which means, that query took less than 100ms. We tried to find bottlenecks, tried to temporarily resize instances, add more mongos instances, even disabled TLS, but the results were the same. Then the 4.4.1 version came out, but the upgrade didn’t change anything. So we decided to downgrade to the 4.2 release. At this time, we monitored various system parts as downgrade was performed. And the performance was back as soon as we restarted mongos instances with the 4.2 binaries and stayed at the same levels as we downgraded all the shards one by one.
An interesting thing, that basic server metrics like CPU, Load, Memory, Disk Activity didn’t change at all during upgrade and downgrade. Just mongos instances became slower with version 4.4 for some reason.
Not sure what metrics I can share as we didn't find anything that can show where the problem is. We do have metric history. If someone has an idea of what metrics could show something interesting, there is a chance that we have that, just do not monitor in dashboards.
The attachment shows avg latency for a query by sharding key in the sharded collection with ~30k documents.
- related to
-
SERVER-52932 Increased operation times after upgrade to 4.4.1
- Closed