-
Type: Question
-
Resolution: Incomplete
-
Priority: Blocker - P1
-
None
-
Affects Version/s: 2.0.2
-
Component/s: Replication
-
None
-
Environment:AWS xLarge instance backed by RAID 10 EBS volumes
Replica set with 5 mongo boxes (slaveOk = disabled)
Suddenly the mongod process go unresponsive. The box just keeps on getting connections from other mongod servers or applications(causing the num-sockets to keep on increasing). Also at that time, I am not able to login to mongo console. Neither it gets stopped from our regular stop script. The mongod process simply go unresponsive.
This had happened some 5-6 times with different servers(sometimes on primaries and secondary) last 1 week. And at that time I just have to force kill that process and then restart.
This is the primary log, where we can clearly see, that suddenly the read queries stopped coming(logging).
http://pastebin.com/VLciscv7
At the same time, mongo-java-client reporting timeOut exception : http://pastebin.com/CB7hEced
Also this is not the fact that queries took more time. The queries hardly take 100 ms. Neither there is any spikes in CPU, load (except for the total-open-sockets)
Need some answer for this behavior?
Let me know if some more specific information needed.