-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: 2.6.11
-
Component/s: Admin
-
None
-
ALL
Hello,
We had an incident today with a 3 node mongo replica set in production running MongoDB 2.6.11.
The average number of connections on the primary is roughly 180. The number of connections increased to 1800 connections and the primary became unreachable. mongotop, mongostats and mongo command were just not connecting to the server.
After several minutes and absolutely no clue about where this was coming from, I killed the primary on the server to force an election. The election worked great (as always), and the replica set is now working perfectly again with a new primary and a normal number of open connections.
We first though there was an intrusion to one of our app servers (since mongo is not accessible from outside our EC2 security group), but since the problems disappeared when we changed the primary, it looks like Mongo suddenly stopped handling closed connection events.
We use no threads on application server side, but we do open/close connections very frequently because we use resque to handle background processes (which opens a connection upon every new job).
It took us time to take the decision to step down the primary because we really thought the cause of the huge increase in the number of connections was from our app servers, but it turned out te be the right thing to do.
Is this a known behaviour in MongoDB ?