Uploaded image for project: 'C Driver'
  1. C Driver
  2. CDRIVER-3636

maxWireVersion check for MaxStalenessSeconds does not exclude PossiblePrimary

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 1.19.1
    • Affects Version/s: 1.16.2
    • Component/s: None
    • Environment:

      Summary
      The Max Staleness specification requires drivers check wire version for support of maxStalenessSeconds:

      Clients are required to throw an error if maxStalenessSeconds is set, and any server in the topology has maxWireVersion less than 5.

      Servers of type PossiblePrimary must not be considered in this check, since they have no known maxWireVersion.

      libmongoc does not exclude PossiblePrimary in mongoc_topology_description_lowest_max_wire_version. This may result in incorrect errors when MaxStalenessSeconds is specified in the read preference. If there is a PossiblePrimary server in the topology description, server selection will fail until the server is rediscovered.

      Original report

      Recently we've added maxStalenessSeconds option to PHP Driver configuration and noticed occasional erroneous log messages.

      Not all servers support maxStalenessSeconds message is displayed in server logs.
      All replica set members are 4.2.5 and that should not happen.
      Before Not all servers support maxStalenessSeconds message we usually get a few messages like No suitable servers found (`serverSelectionTryOnce` set): [connection refused calling ismaster on 'xxx'] and No servers yet eligible for rescan.
      We suppose that previous messages are caused by our 100ms timeout, so they are expected to happen from time to time under load.

      But Not all servers support maxStalenessSeconds should not happen IMHO.
      Digging down to code, I've prepared a reproduction script (it's in PHP, but the problem lies in libmongoc, as I understand).

      <?php
      
      use MongoDB\Driver\Query;
      use MongoDB\Driver\ReadPreference;
      
      ini_set('mongodb.debug', '/var/tmp');
      
      $manager = new MongoDB\Driver\Manager('mongodb://localhost:27100/?replicaSet=replset&compressors=snappy&tls=false&socketTimeoutMS=2&connectTimeoutMS=2&serverSelectionTimeoutMS=2&heartbeatFrequencyMS=500');
      $readPreference = new ReadPreference(ReadPreference::RP_SECONDARY, null, ['maxStalenessSeconds' => 90]);
      $query = new Query(['k' => 'v']);
      
      while (true) {
          try {
              $doc = $manager->executeQuery('delay.coll', $query, ['readPreference' => $readPreference]);
          } catch (\Exception $e) {
              if ($e->getMessage() === 'Not all servers support maxStalenessSeconds') {
                  echo "got it!\n";
                  sleep(86400);
              }
          }
      }
      
      

      Running a few instances of this script under debugger I was able to stop at bp set at https://github.com/mongodb/mongo-c-driver/blob/1.16.2/src/libmongoc/src/mongoc/mongoc-topology.c#L655

      Here is a lldb dump out of there:

      n734:mongodb-driver $ lldb php
      (lldb) target create "php"
      Current executable set to 'php' (x86_64).
      (lldb) b mongoc-topology.c:655
      Breakpoint 1: no locations (pending).
      WARNING:  Unable to resolve breakpoint to any actual locations.
      (lldb) r delay.php
      Process 94260 launched: '/usr/local/bin/php' (x86_64)
      1 location added to breakpoint 1
      
      mongodb.so was compiled with optimization - stepping may behave oddly; variables may not be available.
      Process 94260 stopped
      * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
          frame #0: 0x0000000105a5b7c5 mongodb.so`mongoc_topology_compatible(td=0x0000000105032000, read_prefs=<unavailable>, error=0x00007ffeefbfe390) at mongoc-topology.c:655:10 [opt]
         652           mongoc_topology_description_lowest_max_wire_version (td);
         653  
         654        if (max_wire_version < WIRE_VERSION_MAX_STALENESS) {
      -> 655           bson_set_error (error,
         656                           MONGOC_ERROR_COMMAND,
         657                           MONGOC_ERROR_PROTOCOL_BAD_WIRE_VERSION,
         658                           "Not all servers support maxStalenessSeconds");
      Target 0: (php) stopped.
      (lldb) bt all
      * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
        * frame #0: 0x0000000105a5b7c5 mongodb.so`mongoc_topology_compatible(td=0x0000000105032000, read_prefs=<unavailable>, error=0x00007ffeefbfe390) at mongoc-topology.c:655:10 [opt]
          frame #1: 0x0000000105a5ba52 mongodb.so`mongoc_topology_select_server_id(topology=0x0000000105032000, optype=MONGOC_SS_READ, read_prefs=0x0000000103cd95d0, error=0x00007ffeefbfe390) at mongoc-topology.c:859:15 [opt]
          frame #2: 0x0000000105a5b826 mongodb.so`mongoc_topology_select(topology=0x0000000105032000, optype=<unavailable>, read_prefs=<unavailable>, error=0x00007ffeefbfe390) at mongoc-topology.c:737:7 [opt]
          frame #3: 0x0000000105a22549 mongodb.so`mongoc_client_select_server(client=0x0000000103cdae00, for_writes=false, prefs=0x0000000103cd95d0, error=0x00007ffeefbfe390) at mongoc-client.c:2758:9 [opt]
          frame #4: 0x0000000105a92c9f mongodb.so`php_phongo_manager_select_server(for_writes=false, zreadPreference=0x000000010585fa40, zsession=<unavailable>, client=0x0000000103cdae00, server_id=0x00007ffeefbfe60c) at Manager.c:295:20 [opt]
          frame #5: 0x0000000105a925c6 mongodb.so`zim_Manager_executeQuery(execute_data=<unavailable>, return_value=0x0000000105817170) at Manager.c:573:7 [opt]
          frame #6: 0x0000000103dd51b1 xdebug.so`xdebug_execute_internal + 465
          frame #7: 0x00000001003d748a php`ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER + 380
          frame #8: 0x00000001003bbafb php`execute_ex + 35
          frame #9: 0x0000000103dd4f54 xdebug.so`xdebug_execute_ex + 1044
          frame #10: 0x00000001003bbcb3 php`zend_execute + 352
          frame #11: 0x000000010037eb66 php`zend_execute_scripts + 277
          frame #12: 0x00000001003272e5 php`php_execute_script + 482
          frame #13: 0x000000010040543e php`do_cli + 3833
          frame #14: 0x00000001004043a3 php`main + 1211
          frame #15: 0x00007fff68bfbcc9 libdyld.dylib`start + 1
      (lldb) fr v
      (const mongoc_topology_description_t *) td = 0x0000000105032000
      (const mongoc_read_prefs_t *) read_prefs = <variable not available>
      
      (bson_error_t *) error = 0x00007ffeefbfe390
      (int64_t) max_staleness_seconds = 90
      (int32_t) max_wire_version = 0
      

      As you see max_wire_version is zero. I think that despite mongoc_topology_description_lowest_max_wire_version doc says that owning toplogy mutex should be held, neither mongoc_topology_compatible nor https://github.com/mongodb/mongo-c-driver/blob/1.16.2/src/libmongoc/src/mongoc/mongoc-topology.c#L859 holds mutex.

            Assignee:
            kevin.albertson@mongodb.com Kevin Albertson
            Reporter:
            neanton@gmail.com Anton Neznaienko
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: