-
Type: Bug
-
Resolution: Works as Designed
-
Priority: Major - P3
-
None
-
Affects Version/s: 3.4.16
-
Component/s: Replication, Stability
-
None
-
Fully Compatible
-
ALL
Hey guys, we observed the following weird behaviour with the following setup:
All times are UTC
- 3-member replica set
- two bigger instances for failover - rs1-1 and rs1-2
- one smaller instance for backups
- Around 00:31 the primary rs1-1 had a major spike in memory usage.
- this is inferred from "Cannot allocate memory" messages in the syslog of the instance
- based on the mongo logs: there are no heavy running queries at the time
- After becoming irresponsiveĀ rs1-2 became the new primary and had a similar memory usage spike around 00:37
- again inferred from the syslog
- again no big queries can be seen in the mongo log
- Both instances were irresponsive (not able to SSH, not reporting metrics) for a few hours until restarting them a few hours later
- Upon restartĀ rs1-1 crashed one more time around 06:44
- **After the second crash I scaled up the machines and they have been running OK since then
You can see attached:
- mongo logs from both servers
- diagnostics.data from both servers
Let me know if you need any more information.