-
Type: Task
-
Resolution: Done
-
Affects Version/s: None
-
Component/s: None
-
None
I have a replica set of three machines and I am testing failover capabilities of my system, but there are some bumps on the road.
When I stop a mongod on one of the machines, then the client gets Errno::ECONNREFUSED (which is pretty fast) and keeps running.
But when I take the box down, then the client gets one of these errors: Errno::EHOSTDOWN, Errno::EHOSTUNREACH, Errno::ETIMEDOUT. Each time it takes a while. And mongoid/moped tries to connect to the down host every time, and it always gets this exception from which it does not recover (that is, it does not send the query to surviving nodes).
I tried to play with :down_interval option for moped, but it does not work how I expect it to I thought that it sets a period of time for which moped won't try to reconnect to down node, giving us time to recover.
How do I handle this situation with minimal impact on system's performance?