Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-30600

mongos does not detect stale config when clients use non-primary read preferences

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.4.5
    • Component/s: Querying, Sharding
    • ALL
    • Hide

      Stand up and configure the following MongoDB configuration:

      • Two 3-node replica sets
      • Two mongos instances
      • One config server replica set

      Create an unsharded database and populate a collection with enough test data that it would be split into multiple chunks upon sharding.
      from the first mongos, shard the collection and wait for a chunk to auto-balance over to the second shard.
      using the second mongos, query for data that is in the migrated chunk by shard key using readPreference=secondary. 0 results will be returned.
      using the second mongos, query for data that is in the migrated chunk by shard key using readPreference=primary. The correct results will be returned.
      using the second mongos, again query for data that is in the migrated by shard key chunk using readPreference=secondary. Correct results will now be returned

      Show
      Stand up and configure the following MongoDB configuration: Two 3-node replica sets Two mongos instances One config server replica set Create an unsharded database and populate a collection with enough test data that it would be split into multiple chunks upon sharding. from the first mongos, shard the collection and wait for a chunk to auto-balance over to the second shard. using the second mongos, query for data that is in the migrated chunk by shard key using readPreference=secondary. 0 results will be returned. using the second mongos, query for data that is in the migrated chunk by shard key using readPreference=primary. The correct results will be returned. using the second mongos, again query for data that is in the migrated by shard key chunk using readPreference=secondary. Correct results will now be returned

      Mongos instances which do not receive any requests with the primary read preference do not get their chunk location configuration updated after a chunk migration. This results in missing data in query results in cases where the query includes the shard key and the mongos routes the query to the wrong shard.

      The only workarounds I have come up with so far is to hit every mongos instance with a dummy primary read pref query for each sharded collection (or maybe call the refresh command against the mongos) at some regular interval.

      Background info:
      I run a single 5-node replica which spans 3 data centers. 3 nodes in the central "primary" DC, 1 node in each of our regional "secondary" DCs. My application is read-only, runs in all 3 DCs, has high read performance requirements, and high tolerance for eventual consistency. As a result, I run with the "nearest" read preference so that my app running in a regional DC will prefer to read from the mongodb secondary replica running in the same DC, rather than going all the way back to the primary mongodb in the central DC.

      We've hit VM RAM capacity issues, and are now attempting to shard in-place into 3 shards, with a mongos instance co-located with each app instance. Everything went smoothly at first, I allowed the balancer to migrate some chunks to the new shards. After a few chunks I disabled the balancer to verify no production errors, and found that objects which had moved are no longer coming back in queries by shard key.

      If I make an identical query agains the mongos from the shell (which defaults to primary read preference) I see the following in the logs and get correct results:

      2017-08-10T17:30:45.750+0000 D QUERY    [conn87] Received error status for query query: { guid: "some_guid" } sort: {} projection: {} on attempt 1 of 10: SendStaleConfig: [MyDb.myCollection] shard version not ok: version mismatch detected for MyDb.myCollection ( ns : MyDb.myCollection, received : 118|0||598b5cf1b6ff8d56d195d96f, wanted : 121|1||598b5cf1b6ff8d56d195d96f, send )
      

      Afterwards, my app's queries (using readPref=nearest) correctly return the same results.

            Assignee:
            mark.agarunov Mark Agarunov
            Reporter:
            skelly Seth Kelly
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: