-
Type: Bug
-
Resolution: Gone away
-
Priority: Unknown
-
None
-
Affects Version/s: 1.9.1
-
Component/s: Connections, Error Handling
-
None
Summary
About once a day, we see an error like this: connection() error occurred during connection handshake: dial tcp: lookup foo-bar-mongos.svc.cluster.local on 169.254.25.10:53: no such host
We are using 1.9.1 mongo driver with the following setup:
- sharded cluster
- mongos instances are run as an auto-scaled pool
- access to mongos is via SRV record
Due to how relatively rare these errors are, we assume they take place when one of mongos instances are either starting or shutting down.
Our guess is that the nature of the issue is in a race between SRV and A records, possibly coupled with DNS caches etc. And this seems like the kind of issue that is better handled inside a driver itself.
At this time we can propose no trivial WTR for this issue. If we can be of any help with diagnosing the issue, such as enabling verbose logs and sending them to you, feel free to give instructions.