-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: 1.4.1
-
Component/s: None
-
Fully Compatible
Recently, we started getting occasional unexplainable "hangs" that coincided with hitting the connection pool upper limit. I spent some time debugging, but could not repeat the case on test machine, so I waited till one of our servers "hanged", then dumped app server process and loaded it into WinDBG.
I'll spare you the details, but the culprit in our case lies in MongoServer.RequestStart method. The problem is that RequestStart locks on _serverLock, then proceeds to call MongoServerInstance.AcquireConnection, which, in it's turn, calls MongoConnectionPool.AcquireConnection. When you've already hit the connection pool limit, MongoConnectionPool.AcquireConnection starts waiting on _connectionPoolLock with a timeout (wait queue). Too bad, the MongoServer.ReleaseConnection() locks on MongoServer._serverLock, so no connections can be released back, which leads to connection management being stalled for WaitQueueTimeout.
Other suspicious methods (that access connection pool to acquire connections) include MongoServer.VerifyState and MongoServer.ChooseServerInstance(due to it's call to MongoServer.VerifyUnknownState). Take not that while I think they may contain similar locking pattern, I'm not exactly sure and not yet observed problems related to these two methods (although VerifyState certainly looks like it has the same problem).
I would like to note that this is a very disrupting issue, because sooner or later it brings down any server that is approaching a certain load. The most obvious fix is increasing connection pool limit, and it appears to solve the issue, but it doesn't feel like a proper long-term solution.