-
Type: Bug
-
Resolution: Cannot Reproduce
-
Priority: Major - P3
-
None
-
Affects Version/s: 3.4.5
-
Component/s: Stability, WiredTiger
-
Fully Compatible
-
ALL
-
Hello,
I just suffered a quite bad issue with mongod 3.4.5 (WT), the requests were totally normal, nothing out of the ordinary and suddenly it started taking up to 200 load on my server, and all 8 CPUs of course:
At this point the server stopped responding to any request, but it seems it kept pinging the secondaries and syncing as it stayed primary until I manually changed the priory from the secondary (I couldn't even SSH on the primary as it was killing the machine).
As I had numerous problem of the kind in the past due tu various performance issues in WT, cache eviction, etc. (SERVER-27700) I tried to let it rest see if it recovers but after 3 hours had to hard reboot the server to get it back...
I checked the logs after the reboot and there was just no single line of log during the 3 hours, and the ones before the crash have nothing weird to me. I collected the diagnostic dir, I can give it to you (and the last hour of logs) if you send me your usual upload link.
If you can access my MongoDB Cloud Manager stats, the project id is: 5012a0ac87d1d86fa8c22e64 otherwise I can give you some screenshots, but there's nothing very interesting as these charts were all totally normal until the agent stopped collecting data.
Thanks for your help
- is related to
-
SERVER-29980 Built-in hang detection diagnostics and recovery
- Closed
-
SERVER-29947 Implement Storage Node Watchdog
- Closed