-
Type: Bug
-
Resolution: Duplicate
-
Priority: Critical - P2
-
None
-
Affects Version/s: None
-
Component/s: Sharding
-
None
-
ALL
We had the following issue on our production environment today:
Due to a mistake, a mongod process needed to be restarted. This caused the secondary member of the replica set to failover to primary.
However, after the freshly restarted mongod came back up, another election was held and it was re-elected primary.
From that point on, it was no longer possible to query a non-sharded DB that resides on the replica set that experienced the restart.
Connecting to mongos and trying to query the database returned the following error in mongo shell:
[code]
mongos> db.collection.find()
error:
[code]
After manually retrying the query by repeating the command over and over (between 20-40 times) in mongo shell, the situation eventually cleared up and queries worked normally again, both from the shell as well as from our application. Unfortunately, this process needed to be repeated for every mongos-instance on the cluster, which is six in total.
It looks to me as if mongos does not check connections to the cluster's other members before using them.
Is it possible to add that functionality?
It wouldn't need to check before every use of the connection (though that behaviour might be desirable in some cases, same way it works for connecting to SQL databases from Java using JDBC connection pools), but the administrator shouldn't need to have to manually sort through.
Or is it already there and we just haven't seen the switch for it, yet?
- duplicates
-
SERVER-4706 when a socket between mongos and mongod fails, close all connections immediately
- Closed
- related to
-
SERVER-9041 proactively detect broken connections detected by the network
- Closed