Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-58116

StaleShardVersion error not triggering a refresh in moveChunk

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 6.0.0-rc0
    • Affects Version/s: 5.0.0, 5.1.0
    • Component/s: None
    • Fully Compatible
    • ALL
    • Show
      https://jira.mongodb.org/browse/BF-21676?filter=-1 Rerun
    • Sharding EMEA 2021-12-27, Sharding EMEA 2022-01-10, Sharding EMEA 2022-01-24, Sharding EMEA 2022-02-07, Sharding EMEA 2022-02-21, Sharding EMEA 2022-03-07, Sharding EMEA 2022-03-21
    • 32

      When a stale mongos gets a moveChunk command it first does a refresh. However a mongos might have refreshed from a stale configsvr secondary that has not seen the latest split / merge operation on yet.

      The mongos may not yet know of a clusterTime inclusive of the split because another mongos did it, so there is no causal consistency guarantee.

      For a moveChunk operation the shard will later throw a StaleShardVersion error here.

      However the mongos will not retry the operation because this code is missing the StaleConfigInfo extra information, which causes the code in strategy.cpp to abort a retry attempt.

      Possible solutions:

      • Attach StaleConfigInfo to the exceptions on the shard
      • Perform a version check on the configsvr

       

            Assignee:
            antonio.fuschetto@mongodb.com Antonio Fuschetto
            Reporter:
            simon.gratzer@mongodb.com Simon Gratzer (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: