If a network error occurs before the ismaster handshake completes, SDAM says this should invalidate the server description if the connection's generation is valid:
If there is a network error or timeout on the connection before the handshake completes, the client MUST replace the server's description with a default ServerDescription of type Unknown, and fill the ServerDescription's error field with useful information.
The current behavior is a bit buggy.
_mongoc_stream_run_ismaster uses the current server description and runs a the ismaster with mongoc_cluster_run_command_private. That function handles network errors as if they are post-handshake errors (invalidates a server if non-timeout).
When that error bubbles up to _mongoc_cluster_stream_for_server, it ends up invalidating the server again.
I believe this has been a long-standing issue, and in practice this may not be terribly problematic to have multiple invalidations. Here's one such scenario where this could happen.
- Thread A creates a connection with generation 0.
- Thread B receives a network error and invalidates the server, incrementing the generation to 1.
- Thread A begins the handshake, calling _mongoc_stream_run_ismaster which retrieves the server description with generation 1
- Thread A receives a network error when performing the handshake, thinks it has the latest generation (though it really doesn't), and invalidates again.
The introduction of a connection generation of CDRIVER-3615 should prevent this behavior (only the first invalidation wins). Unfortunately, the server description retrieved by _mongoc_stream_run_ismaster could have a later generation than when the stream was created (causing a double invalidation) and mongoc_cluster_run_command_private does not check the generation (so it always invalidates).