-
Type: Improvement
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Sharding NYC
Steps to reproduce:
- Set up a sharded cluster with the new "config-shard" feature
- Spin up a new mongod as a second shard. (Forget to change the arguments)
/Users/joanna/skunkworks/odcs-arm64/mongod --setParameter featureFlagCatalogShard=true --setParameter enableTestCommands=true --configsvr --replSet shard-2 --dbpath sh2-dbpath --fork --logpath sh2-dbpath/mongodb.log --logappend --port 27020
- rs.initiate() this shard. This succeeds
- Try to sh.addShard() from your mongos. This fails
"errmsg" : "Cannot add shard-2/localhost:27020 as a shard since it is a config server",
- Shutdown the node and restart it with --shardsvr
% /Users/joanna/skunkworks/odcs-arm64/mongod --setParameter enableTestCommands=true --shardsvr --replSet shard-2 --dbpath sh2-dbpath --fork --logpath sh2-dbpath/mongodb.log --logappend --port 27020 {"t":{"$date":"2023-03-07T04:57:42.270Z"},"s":"I", "c":"CONTROL", "id":5760901, "ctx":"thread1","msg":"Applied --setParameter options","attr":{"serverParameters":{"enableTestCommands":{"default":false,"value":true}}}} about to fork child process, waiting until server is ready for connections. forked process: 89394 ^C
This hangs (although it seems to complete - I have a forked PID). The ctrl-C is from me after I got impatient
The shard now has an identity crisis - it's started, and listening on the right port, but still can't connect to itself. (I assume this is because it's expecting a config server on port 27020, but is getting a not-set-up-properly shard server)
{"t":{"$date":"2023-03-07T15:44:06.567+11:00"},"s":"W", "c":"SHARDING", "id":22074, "ctx":"initandlisten","msg":"Started with --shardsvr, but no shardIdentity document was found on disk. This most likely means this server has not yet been added to a sharded cluster","attr":{"namespace":"admin.system.version"}} .... {"t":{"$date":"2023-03-07T15:44:06.691+11:00"},"s":"I", "c":"NETWORK", "id":23015, "ctx":"listener","msg":"Listening on","attr":{"address":"/tmp/mongodb-27020.sock"}} {"t":{"$date":"2023-03-07T15:44:06.692+11:00"},"s":"I", "c":"NETWORK", "id":23015, "ctx":"listener","msg":"Listening on","attr":{"address":"127.0.0.1"}} {"t":{"$date":"2023-03-07T15:44:06.693+11:00"},"s":"I", "c":"NETWORK", "id":23016, "ctx":"listener","msg":"Waiting for connections","attr":{"port":27020,"ssl":"off"}} ... {"t":{"$date":"2023-03-07T16:19:25.068+11:00"},"s":"I", "c":"-", "id":4333222, "ctx":"ReplicaSetMonitor-TaskExecutor","msg":"RSM received error response","attr":{"host":"localhost:27020","error":"NetworkInterfaceExceededTimeLimit: Couldn't get a connection within the time limit","replicaSet":"shard-2","response":{}}} {"t":{"$date":"2023-03-07T16:19:25.069+11:00"},"s":"I", "c":"NETWORK", "id":4712102, "ctx":"ReplicaSetMonitor-TaskExecutor","msg":"Host failed in replica set","attr":{"replicaSet":"shard-2","host":"localhost:27020","error":{"code":202,"codeName":"NetworkInterfaceExceededTimeLimit","errmsg":"Couldn't get a connection within the time limit"},"action":{"dropConnections":false,"requestImmediateCheck":false,"outcome":{"host":"localhost:27020","success":false,"errorMessage":"NetworkInterfaceExceededTimeLimit: Couldn't get a connection within the time limit"}}}}
The node itself is also not connectable via the shell
% mongo --port 27020
MongoDB shell version v5.0.9
connecting to: mongodb://127.0.0.1:27020/?compressors=disabled&gssapiServiceName=mongodb
Error: couldn't connect to server 127.0.0.1:27020, connection attempt failed: SocketException: Error connecting to 127.0.0.1:27020 :: caused by :: Operation timed out :
connect@src/mongo/shell/mongo.js:372:17
@(connect):2:6
exception: connect failed
exiting with code 1
Repeating the same steps on the same version, without the special featureFlagCatalogShard:
- Start mongod with --configsvr
% /Users/joanna/skunkworks/odcs-arm64/mongod --configsvr --replSet shard-test --dbpath test-other-shard --fork --logpath test-other-shard/mongodb.log --logappend --port 27030
- Run rs.initiate()
- Shut down mongod and restart with --shardsvr instead
The mongod refuses to start% /Users/joanna/skunkworks/odcs-arm64/mongod --shardsvr --replSet shard-test --dbpath test-other-shard --fork --logpath test-other-shard/mongodb.log --logappend --port 27030 about to fork child process, waiting until server is ready for connections. forked process: 89713 ERROR: child process failed, exited with 1 To see additional information in this output, start without the "--fork" option.
Error in logs
{"t":{"$date":"2023-03-07T16:02:38.455+11:00"},"s":"E", "c":"REPL", "id":21415, "ctx":"ReplCoord-0","msg":"Locally stored replica set configuration is invalid; See http://www.mongodb.org/dochub/core/recover-replica-set-from-invalid-config for information on how to recover from this","attr":{"error":{"code":2,"codeName":"BadValue","errmsg":"Nodes being used for config servers must be started with the --configsvr flag"},"localConfig":{"_id":"shard-test","version":1,"term":1,"members":[{"_id":0,"host":"localhost:27030","arbiterOnly":false,"buildIndexes":true,"hidden":false,"priority":1,"tags":{},"secondaryDelaySecs":0,"votes":1}],"configsvr":true,"protocolVersion":1,"writeConcernMajorityJournalDefault":true,"settings":{"chainingAllowed":true,"heartbeatIntervalMillis":2000,"heartbeatTimeoutSecs":10,"electionTimeoutMillis":10000,"catchUpTimeoutMillis":-1,"catchUpTakeoverDelayMillis":30000,"getLastErrorModes":{},"getLastErrorDefaults":{"w":1,"wtimeout":0},"replicaSetId":{"$oid":"6406c55bc060c6ffdffd9ca7"}}}}} ... {"t":{"$date":"2023-03-07T16:02:38.455+11:00"},"s":"F", "c":"ASSERT", "id":23091, "ctx":"ReplCoord-0","msg":"Fatal assertion","attr":{"msgid":28544,"file":"src/mongo/db/repl/replication_coordinator_impl.cpp","line":619}} {"t":{"$date":"2023-03-07T16:02:38.455+11:00"},"s":"I", "c":"CONTROL", "id":20710, "ctx":"LogicalSessionCacheRefresh","msg":"Failed to refresh session cache, will try again at the next refresh interval","attr":{"error":"ShardingStateNotInitialized: sharding state is not yet initialized"}} {"t":{"$date":"2023-03-07T16:02:38.455+11:00"},"s":"I", "c":"NETWORK", "id":23015, "ctx":"listener","msg":"Listening on","attr":{"address":"/tmp/mongodb-27030.sock"}} {"t":{"$date":"2023-03-07T16:02:38.456+11:00"},"s":"F", "c":"ASSERT", "id":23092, "ctx":"ReplCoord-0","msg":"\n\n***aborting after fassert() failure\n\n"}
- duplicates
-
SERVER-74311 Add sanity check assertions that only a config server has config shard identity
- Closed