Uploaded image for project: 'C Driver'
  1. C Driver
  2. CDRIVER-3654

Pooled handshake does not handle network errors correctly

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: libmongoc, network
    • None

      If a network error occurs before the ismaster handshake completes, SDAM says this should invalidate the server description if the connection's generation is valid:

      If there is a network error or timeout on the connection before the handshake completes, the client MUST replace the server's description with a default ServerDescription of type Unknown, and fill the ServerDescription's error field with useful information.

      The current behavior is a bit buggy.

      _mongoc_stream_run_ismaster uses the current server description and runs a the ismaster with mongoc_cluster_run_command_private. That function handles network errors as if they are post-handshake errors (invalidates a server if non-timeout).

      When that error bubbles up to _mongoc_cluster_stream_for_server, it ends up invalidating the server again.

      I believe this has been a long-standing issue, and in practice this may not be terribly problematic to have multiple invalidations. Here's one such scenario where this could happen.

      • Thread A creates a connection with generation 0.
      • Thread B receives a network error and invalidates the server, incrementing the generation to 1.
      • Thread A begins the handshake, calling _mongoc_stream_run_ismaster which retrieves the server description with generation 1
      • Thread A receives a network error when performing the handshake, thinks it has the latest generation (though it really doesn't), and invalidates again.

      The introduction of a connection generation of CDRIVER-3615 should prevent this behavior (only the first invalidation wins). Unfortunately, the server description retrieved by _mongoc_stream_run_ismaster could have a later generation than when the stream was created (causing a double invalidation) and mongoc_cluster_run_command_private does not check the generation (so it always invalidates).

            Assignee:
            Unassigned Unassigned
            Reporter:
            kevin.albertson@mongodb.com Kevin Albertson
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: