-
Type: Improvement
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Diagnostics, Sharding
-
None
-
Minor Change
-
Sharding 12 (04/01/16), Sharding 2018-07-16
The CSRS-equivalent follow-on from SERVER-15879.
If a mongos is started with an incorrect replica set name in the --configdb parameter, there is no clear message indicating that this is the problem.
$ /var/lib/mongodb-mms-automation/mongodb-linux-x86_64-3.3.3-dev/bin/mongos --port 36901 --configdb config/devkev-1.devkev.5431.mongodbdns.com:36800 2016-03-22T07:05:21.239+0000 W SHARDING [main] Running a sharded cluster with fewer than 3 config servers should only be done for testing purposes and is not recommended for production. 2016-03-22T07:05:21.243+0000 I CONTROL [main] 2016-03-22T07:05:21.243+0000 I CONTROL [main] ** NOTE: This is a development version (3.3.3) of MongoDB. 2016-03-22T07:05:21.243+0000 I CONTROL [main] ** Not recommended for production. 2016-03-22T07:05:21.243+0000 I CONTROL [main] 2016-03-22T07:05:21.243+0000 I SHARDING [mongosMain] MongoS version 3.3.3 starting: pid=22219 port=36901 64-bit host=devkev-1 (--help for usage) 2016-03-22T07:05:21.243+0000 I CONTROL [mongosMain] db version v3.3.3 2016-03-22T07:05:21.243+0000 I CONTROL [mongosMain] git version: 1eedf75bf5ac0098f212d931bad94bd8b54060c3 2016-03-22T07:05:21.243+0000 I CONTROL [mongosMain] allocator: tcmalloc 2016-03-22T07:05:21.243+0000 I CONTROL [mongosMain] modules: none 2016-03-22T07:05:21.243+0000 I CONTROL [mongosMain] build environment: 2016-03-22T07:05:21.243+0000 I CONTROL [mongosMain] distarch: x86_64 2016-03-22T07:05:21.243+0000 I CONTROL [mongosMain] target_arch: x86_64 2016-03-22T07:05:21.243+0000 I CONTROL [mongosMain] options: { net: { port: 36901 }, sharding: { configDB: "config/devkev-1.devkev.5431.mongodbdns.com:36800" } } 2016-03-22T07:05:21.243+0000 I SHARDING [mongosMain] Updating config server connection string to: config/devkev-1.devkev.5431.mongodbdns.com:36800 2016-03-22T07:05:21.243+0000 I NETWORK [mongosMain] Starting new replica set monitor for config/devkev-1.devkev.5431.mongodbdns.com:36800 2016-03-22T07:05:21.243+0000 I NETWORK [ReplicaSetMonitorWatcher] starting 2016-03-22T07:05:21.245+0000 I SHARDING [thread1] creating distributed lock ping thread for process devkev-1:36901:1458630321:439377670 (sleeping for 30000ms) 2016-03-22T07:05:21.246+0000 W NETWORK [mongosMain] node: devkev-1.devkev.5431.mongodbdns.com:36800 isn't a part of set: config ismaster: { hosts: [ "devkev-1.devkev.5431.mongodbdns.com:36800" ], setName: "configRS", setVersion: 1, ismaster: true, secondary: false, primary: "devkev-1.devkev.5431.mongodbdns.com:36800", me: "devkev-1.devkev.5431.mongodbdns.com:36800", electionId: ObjectId('7fffffff0000000000000001'), configsvr: 1, maxBsonObjectSize: 16777216, maxMessageSizeBytes: 48000000, maxWriteBatchSize: 1000, localTime: new Date(1458630321246), maxWireVersion: 4, minWireVersion: 0, readOnly: false, ok: 1.0 } 2016-03-22T07:05:21.246+0000 W NETWORK [mongosMain] No primary detected for set config 2016-03-22T07:05:21.246+0000 I NETWORK [mongosMain] All nodes for set config are down. This has happened for 1 checks in a row. Polling will stop after 29 more failed checks 2016-03-22T07:05:21.746+0000 W NETWORK [mongosMain] node: devkev-1.devkev.5431.mongodbdns.com:36800 isn't a part of set: config ismaster: { hosts: [ "devkev-1.devkev.5431.mongodbdns.com:36800" ], setName: "configRS", setVersion: 1, ismaster: true, secondary: false, primary: "devkev-1.devkev.5431.mongodbdns.com:36800", me: "devkev-1.devkev.5431.mongodbdns.com:36800", electionId: ObjectId('7fffffff0000000000000001'), configsvr: 1, maxBsonObjectSize: 16777216, maxMessageSizeBytes: 48000000, maxWriteBatchSize: 1000, localTime: new Date(1458630321746), maxWireVersion: 4, minWireVersion: 0, readOnly: false, ok: 1.0 } 2016-03-22T07:05:21.746+0000 W NETWORK [mongosMain] No primary detected for set config 2016-03-22T07:05:21.747+0000 I NETWORK [mongosMain] All nodes for set config are down. This has happened for 2 checks in a row. Polling will stop after 28 more failed checks 2016-03-22T07:05:22.247+0000 W NETWORK [mongosMain] node: devkev-1.devkev.5431.mongodbdns.com:36800 isn't a part of set: config ismaster: { hosts: [ "devkev-1.devkev.5431.mongodbdns.com:36800" ], setName: "configRS", setVersion: 1, ismaster: true, secondary: false, primary: "devkev-1.devkev.5431.mongodbdns.com:36800", me: "devkev-1.devkev.5431.mongodbdns.com:36800", electionId: ObjectId('7fffffff0000000000000001'), configsvr: 1, maxBsonObjectSize: 16777216, maxMessageSizeBytes: 48000000, maxWriteBatchSize: 1000, localTime: new Date(1458630322247), maxWireVersion: 4, minWireVersion: 0, readOnly: false, ok: 1.0 } 2016-03-22T07:05:22.247+0000 W NETWORK [mongosMain] No primary detected for set config 2016-03-22T07:05:22.247+0000 I NETWORK [mongosMain] All nodes for set config are down. This has happened for 3 checks in a row. Polling will stop after 27 more failed checks ...
The mongos is not yet accepting connections, and continues to retry forever. This retry behaviour is legit, because it's appropriate for transient errors (eg. config servers temporarily down). However, passing the wrong replica set name to --configdb is not a transient error.
The predominant error messages are all confusing:
node: devkev-1.devkev.5431.mongodbdns.com:36800 isn't a part of set: config ismaster: { hosts: [ "devkev-1.devkev.5431.mongodbdns.com:36800" ], setName: "configRS", setVersion: 1, ismaster: true, secondary: false, primary: "devkev-1.devkev.5431.mongodbdns.com:36800", me: "devkev-1.devkev.5431.mongodbdns.com:36800", electionId: ObjectId('7fffffff0000000000000001'), configsvr: 1, maxBsonObjectSize: 16777216, maxMessageSizeBytes: 48000000, maxWriteBatchSize: 1000, localTime: new Date(1458630321246), maxWireVersion: 4, minWireVersion: 0, readOnly: false, ok: 1.0 }
Since the hostname which appears after "node:" is correct, this is confusing for users because the host clearly is a member of the replica set — it shows up in rs.status() on the CSRS, and even shows up later on this line itself in the "hosts" field of ismaster.
No primary detected for set config
This is confusing because there clearly is a primary in the set — again, as indicated in rs.status() on the CSRS, and in the "primary" field of the ismaster section on the above line.
All nodes for set config are down.
Again, this is confusing because the nodes in the set are clearly up, and the mongos is not reporting any trouble with connecting to them (apart from the also mysterious inability to locate a primary).
In the case where the setName returned by ismaster doesn't match the replset name given in --configdb, there should be a clear message stating this.
For example, the above logs would be much clearer if they were something like:
2016-03-22T07:05:21.243+0000 I SHARDING [mongosMain] Updating config server connection string to: config/devkev-1.devkev.5431.mongodbdns.com:36800 2016-03-22T07:05:21.243+0000 I NETWORK [mongosMain] Starting new replica set monitor for config/devkev-1.devkev.5431.mongodbdns.com:36800 2016-03-22T07:05:21.243+0000 I NETWORK [ReplicaSetMonitorWatcher] starting 2016-03-22T07:05:21.245+0000 I SHARDING [thread1] creating distributed lock ping thread for process devkev-1:36901:1458630321:439377670 (sleeping for 30000ms) +2016-03-22T07:05:21.246+0000 E NETWORK [mongosMain] Error: Remote config server replica set name "configRS" does not match expected replica set name "config" 2016-03-22T07:05:21.246+0000 W NETWORK [mongosMain] node: devkev-1.devkev.5431.mongodbdns.com:36800 isn't a part of set: config ismaster: { hosts: [ "devkev-1.devkev.5431.mongodbdns.com:36800" ], setName: "configRS", setVersion: 1, ismaster: true, secondary: false, primary: "devkev-1.devkev.5431.mongodbdns.com:36800", me: "devkev-1.devkev.5431.mongodbdns.com:36800", electionId: ObjectId('7fffffff0000000000000001'), configsvr: 1, maxBsonObjectSize: 16777216, maxMessageSizeBytes: 48000000, maxWriteBatchSize: 1000, localTime: new Date(1458630321246), maxWireVersion: 4, minWireVersion: 0, readOnly: false, ok: 1.0 } 2016-03-22T07:05:21.246+0000 W NETWORK [mongosMain] No primary detected for set config 2016-03-22T07:05:21.246+0000 I NETWORK [mongosMain] All nodes for set config are down. This has happened for 1 checks in a row. Polling will stop after 29 more failed checks
- related to
-
SERVER-15879 Better error reporting when the mongos configdb string does not match with cached one
- Closed