Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-22462

Autosplitting failure caused by stale config in runCommand

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.2.1
    • Component/s: Sharding
    • None
    • Sharding
    • Fully Compatible
    • ALL

      We are running multiple sharded mongo clusters, and recently one of our clusters started having an autosplitting issue.

      Our mongos processes have been logging the following messages:

      I SHARDING [conn14835] sharded connection to shard1/mongo-blob-1:27017,mongo-blob-2:27017 not being returned to the pool
      W SHARDING [conn14835] could not autosplit collection database_name.collection_name :: caused by :: 9996 stale config in runCommand ( ns : database_name.collection_name, received : 2|4||56b053c081c73af0480d60fe, wanted : 2|7||56b053c081c73af0480d60fe, recv )
      

      These messages always appear together and seem related. Only one of our clusters is affected. The warning appears with several databases and collections, but for others autosplitting seems to remain functional.

      I have tried restarting each mongod and mongos process in this specific cluster, but nothing changed. I cannot find any issues with the config servers for this cluster either. We have a replicated config server setup (the 3.2 default).

      Any advice on how to proceed? I assume this issue is an indication that something is wrong with my config cluster. Are there any diagnostics commands available to check the config cluster health? I would prefer to not have to resync my config cluster, as that would give me downtime on my service. Could simply restarting the config servers be sufficient?

      I welcome any advice.

            Assignee:
            backlog-server-sharding [DO NOT USE] Backlog - Sharding Team
            Reporter:
            ruphin Goffert van Gool
            Votes:
            6 Vote for this issue
            Watchers:
            21 Start watching this issue

              Created:
              Updated:
              Resolved: