Root cause found - see comment 4
For testing I set up a 3.2-rc1 cluster with 32 shards. WT, Zlib - EVERYTHING on LOCALHOST
I used a SINGLE , non replica set config server which rc1 allows me to do ( I raised a previous ticket that it should)
I was able to successfully load 800GB of sharded data using a Java based loader.
After restarting the cluster I get.
MongoDB Enterprise mongos> show dbs 2015-10-30T10:30:24.140+0000 E QUERY [thread1] Error: listDatabases failed:{ "code" : 6, "ok" : 0, "errmsg" : "Connection refused" } : _getErrorWithCode@src/mongo/shell/utils.js:23:13 Mongo.prototype.getDBs@src/mongo/shell/mongo.js:53:1 shellHelper.show@src/mongo/shell/utils.js:697:19 shellHelper@src/mongo/shell/utils.js:591:15 @(shellhelp2):1:1 MongoDB Enterprise mongos>
sh.status appears to work OK
Show collections gives
MongoDB Enterprise mongos> show collections 2015-10-30T10:31:16.473+0000 E QUERY [thread1] Error: listCollections failed: { "code" : 13328, "ok" : 0, "errmsg" : "connection pool: connect failed localhost:27102 : couldn't connect to server localhost:27102, connection attempt failed" } : _getErrorWithCode@src/mongo/shell/utils.js:23:13 DB.prototype._getCollectionInfosCommand@src/mongo/shell/db.js:746:1 DB.prototype.getCollectionInfos@src/mongo/shell/db.js:758:15 DB.prototype.getCollectionNames@src/mongo/shell/db.js:769:12 shellHelper.show@src/mongo/shell/utils.js:692:9 shellHelper@src/mongo/shell/utils.js:591:15 @(shellhelp2):1:1
All mongod logs show
2015-10-30T10:23:04.744+0000 I CONTROL [initandlisten] options: { net: { port: 27101 }, processManagement: { fork: true }, storage: { dbPath: "/data/shard1", journal: { enabled: false }, wiredTiger: { collectionConfig: { blockCompressor: "zlib" }, engineConfig: { cacheSizeGB: 1 } } }, systemLog: { destination: "file", path: "/data/log1.log" } } 2015-10-30T10:23:04.746+0000 I FTDC [initandlisten] Starting full-time diagnostic data capture with directory '/data/shard1/diagnostic.data' 2015-10-30T10:23:04.746+0000 I NETWORK [HostnameCanonicalizationWorker] Starting hostname canonicalization worker 2015-10-30T10:23:04.746+0000 I SHARDING [initandlisten] Sharding state recovery process found document { _id: "minOpTimeRecovery", configsvrConnectionString: "localhost:27019", shardName: "shard0000", minOpTime: { ts: Timestamp 0|0, t: -1 }, minOpTimeUpdaters: 0 } 2015-10-30T10:23:04.746+0000 I SHARDING [initandlisten] first cluster operation detected, adding sharding hook to enable versioning and authentication to remote servers 2015-10-30T10:23:04.747+0000 I SHARDING [initandlisten] Updating config server connection string to: localhost:27019 2015-10-30T10:23:04.748+0000 W NETWORK [initandlisten] Failed to connect to 127.0.0.1:27019, reason: errno:111 Connection refused 2015-10-30T10:23:04.748+0000 I STORAGE [initandlisten] exception in initAndListen: 13328 connection pool: connect failed localhost:27019 : couldn't connect to server localhost:27019, connection attempt failed, terminating 2015-10-30T10:23:04.748+0000 I FTDC [initandlisten] Stopping full-time diagnostic data capture
I can connect to the config server with mongo on localhost:27019