Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-74663

Transitioning a server from configsvr to shardsvr doesn't succed, but also doesn't error

    • Type: Icon: Improvement Improvement
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Sharding NYC

      Steps to reproduce:

      1. Set up a sharded cluster with the new "config-shard" feature
      2. Spin up a new mongod as a second shard. (Forget to change the arguments)
        /Users/joanna/skunkworks/odcs-arm64/mongod --setParameter featureFlagCatalogShard=true --setParameter enableTestCommands=true --configsvr --replSet shard-2 --dbpath sh2-dbpath --fork --logpath sh2-dbpath/mongodb.log --logappend --port 27020
        
      3. rs.initiate() this shard. This succeeds
      4. Try to sh.addShard() from your mongos. This fails
        	"errmsg" : "Cannot add shard-2/localhost:27020 as a shard since it is a config server",
        
      5. Shutdown the node and restart it with --shardsvr
        % /Users/joanna/skunkworks/odcs-arm64/mongod --setParameter enableTestCommands=true --shardsvr --replSet shard-2 --dbpath sh2-dbpath --fork --logpath sh2-dbpath/mongodb.log --logappend --port 27020
        {"t":{"$date":"2023-03-07T04:57:42.270Z"},"s":"I",  "c":"CONTROL",  "id":5760901, "ctx":"thread1","msg":"Applied --setParameter options","attr":{"serverParameters":{"enableTestCommands":{"default":false,"value":true}}}}
        about to fork child process, waiting until server is ready for connections.
        forked process: 89394
        ^C
        

        This hangs (although it seems to complete - I have a forked PID). The ctrl-C is from me after I got impatient

      The shard now has an identity crisis - it's started, and listening on the right port, but still can't connect to itself. (I assume this is because it's expecting a config server on port 27020, but is getting a not-set-up-properly shard server)

      {"t":{"$date":"2023-03-07T15:44:06.567+11:00"},"s":"W",  "c":"SHARDING", "id":22074,   "ctx":"initandlisten","msg":"Started with --shardsvr, but no shardIdentity document was found on disk. This most likely means this server has not yet been added to a sharded cluster","attr":{"namespace":"admin.system.version"}}
      ....
      {"t":{"$date":"2023-03-07T15:44:06.691+11:00"},"s":"I",  "c":"NETWORK",  "id":23015,   "ctx":"listener","msg":"Listening on","attr":{"address":"/tmp/mongodb-27020.sock"}}
      {"t":{"$date":"2023-03-07T15:44:06.692+11:00"},"s":"I",  "c":"NETWORK",  "id":23015,   "ctx":"listener","msg":"Listening on","attr":{"address":"127.0.0.1"}}
      {"t":{"$date":"2023-03-07T15:44:06.693+11:00"},"s":"I",  "c":"NETWORK",  "id":23016,   "ctx":"listener","msg":"Waiting for connections","attr":{"port":27020,"ssl":"off"}}
      ...
      {"t":{"$date":"2023-03-07T16:19:25.068+11:00"},"s":"I",  "c":"-",        "id":4333222, "ctx":"ReplicaSetMonitor-TaskExecutor","msg":"RSM received error response","attr":{"host":"localhost:27020","error":"NetworkInterfaceExceededTimeLimit: Couldn't get a connection within the time limit","replicaSet":"shard-2","response":{}}}
      {"t":{"$date":"2023-03-07T16:19:25.069+11:00"},"s":"I",  "c":"NETWORK",  "id":4712102, "ctx":"ReplicaSetMonitor-TaskExecutor","msg":"Host failed in replica set","attr":{"replicaSet":"shard-2","host":"localhost:27020","error":{"code":202,"codeName":"NetworkInterfaceExceededTimeLimit","errmsg":"Couldn't get a connection within the time limit"},"action":{"dropConnections":false,"requestImmediateCheck":false,"outcome":{"host":"localhost:27020","success":false,"errorMessage":"NetworkInterfaceExceededTimeLimit: Couldn't get a connection within the time limit"}}}}
      

      The node itself is also not connectable via the shell

      % mongo --port 27020
      MongoDB shell version v5.0.9
      connecting to: mongodb://127.0.0.1:27020/?compressors=disabled&gssapiServiceName=mongodb
      Error: couldn't connect to server 127.0.0.1:27020, connection attempt failed: SocketException: Error connecting to 127.0.0.1:27020 :: caused by :: Operation timed out :
      connect@src/mongo/shell/mongo.js:372:17
      @(connect):2:6
      exception: connect failed
      exiting with code 1
      

      Repeating the same steps on the same version, without the special featureFlagCatalogShard:

      1. Start mongod with --configsvr
        % /Users/joanna/skunkworks/odcs-arm64/mongod --configsvr --replSet shard-test --dbpath test-other-shard --fork --logpath test-other-shard/mongodb.log --logappend --port 27030
        
      2. Run rs.initiate()
      3. Shut down mongod and restart with --shardsvr instead
        The mongod refuses to start
        % /Users/joanna/skunkworks/odcs-arm64/mongod --shardsvr --replSet shard-test --dbpath test-other-shard --fork --logpath test-other-shard/mongodb.log --logappend --port 27030
        about to fork child process, waiting until server is ready for connections.
        forked process: 89713
        ERROR: child process failed, exited with 1
        To see additional information in this output, start without the "--fork" option.
        

        Error in logs

        {"t":{"$date":"2023-03-07T16:02:38.455+11:00"},"s":"E",  "c":"REPL",     "id":21415,   "ctx":"ReplCoord-0","msg":"Locally stored replica set configuration is invalid; See http://www.mongodb.org/dochub/core/recover-replica-set-from-invalid-config for information on how to recover from this","attr":{"error":{"code":2,"codeName":"BadValue","errmsg":"Nodes being used for config servers must be started with the --configsvr flag"},"localConfig":{"_id":"shard-test","version":1,"term":1,"members":[{"_id":0,"host":"localhost:27030","arbiterOnly":false,"buildIndexes":true,"hidden":false,"priority":1,"tags":{},"secondaryDelaySecs":0,"votes":1}],"configsvr":true,"protocolVersion":1,"writeConcernMajorityJournalDefault":true,"settings":{"chainingAllowed":true,"heartbeatIntervalMillis":2000,"heartbeatTimeoutSecs":10,"electionTimeoutMillis":10000,"catchUpTimeoutMillis":-1,"catchUpTakeoverDelayMillis":30000,"getLastErrorModes":{},"getLastErrorDefaults":{"w":1,"wtimeout":0},"replicaSetId":{"$oid":"6406c55bc060c6ffdffd9ca7"}}}}}
        ...
        {"t":{"$date":"2023-03-07T16:02:38.455+11:00"},"s":"F",  "c":"ASSERT",   "id":23091,   "ctx":"ReplCoord-0","msg":"Fatal assertion","attr":{"msgid":28544,"file":"src/mongo/db/repl/replication_coordinator_impl.cpp","line":619}}
        {"t":{"$date":"2023-03-07T16:02:38.455+11:00"},"s":"I",  "c":"CONTROL",  "id":20710,   "ctx":"LogicalSessionCacheRefresh","msg":"Failed to refresh session cache, will try again at the next refresh interval","attr":{"error":"ShardingStateNotInitialized: sharding state is not yet initialized"}}
        {"t":{"$date":"2023-03-07T16:02:38.455+11:00"},"s":"I",  "c":"NETWORK",  "id":23015,   "ctx":"listener","msg":"Listening on","attr":{"address":"/tmp/mongodb-27030.sock"}}
        {"t":{"$date":"2023-03-07T16:02:38.456+11:00"},"s":"F",  "c":"ASSERT",   "id":23092,   "ctx":"ReplCoord-0","msg":"\n\n***aborting after fassert() failure\n\n"}
        

        1. test-other-shard-mongodb.log
          84 kB
        2. shard-2-mongodb.log
          1.21 MB

            Assignee:
            backlog-server-sharding-nyc [DO NOT USE] Backlog - Sharding NYC
            Reporter:
            joanna.cheng@mongodb.com Joanna Cheng
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: