-
Type: Improvement
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Error Handling
-
None
New Description:
If there's a configuration error that prevents any connection handshakes from succeeding (e.g. configuring a Client with no TLS enabled when the server requires TLS connections), all operations will fail with a server selection timeout error. The error message will be something like:
server selection error: server selection timeout, current topology: { Type: Unknown, Servers: [{ Addr: localhost:27017, Type: Unknown, State: Connected, Average RTT: 0, Last error: connection() : connection(localhost:27017[-10]) incomplete read of message header: EOF },] }
The error reported from Last error: ... is of type ConnectionError and could be improved. Some ideas:
- Add handshaking state to the error message. It'd be useful to know that this failure was during handshaking rather than a regular operation because it clarifies that the connection was never established successfully.
- The incomplete read of message header signals that the initial 4-byte read from the socket failed. In the case that the read returned (0, io.EOF), this message could be improved to something like socket was unexpectedly closed to indicate that the server hung up the connection.
The previous description mentioned failing fast for TLS errors rather than waiting for server selection to time out. We won't be doing this because there are edge cases where only a subset of servers are unreachable due to TLS errors and some TLS errors can be transient (e.g. OCSP responses are cached so it's possible the response changes after the cached version expires) so it's important that we block for the server selection period and report the full state of all servers in the error message.
Previous Description:
When there is a TLS error, such as connecting without TLS to a server that requires it or vice-versa, or when connecting with an invalid certificate, the Go driver eventually fails server selection with an error but gives no indication of the reason why. This is going to cause confusion for users and increase support inquiries.
As we have no TLS spec, we never actually say anywhere that a TLS error needs to be reported, but it can be implied from both the auth spec (about handshake errors being auth errors, which need to fail fast and be reported with details) and the server selection spec (about reporting "useful" error messages when selection fails).
- is depended on by
-
TOOLS-1833 Migrate tools (excluding mongoreplay) to new Go Driver
- Development Complete
- is related to
-
TOOLS-2299 Clearer error than "error dialing host:port: Host validation error"
- Closed
-
DRIVERS-2421 Drivers should include topology description in server selection timeout errors
- Closed
-
GODRIVER-733 Add diagnostic information to server selection errors
- Closed