-
Type: Bug
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: 2.0.4, 2.0.6
-
Environment:OSX, linux
-
ALL
Hi.
I have a sharded cluster using authentication. If I stop one of the config servers along with one of my data nodes, I start getting this error when attempting to connect to mongos and run any commands: uncaught exception: error
{ "$err" : "socket exception", "code" : 9001 }.
The problem appears to be worse in 2.0.6. If I just shut down a single config server in 2.0.6 I immediately start getting socket exception errors.
Looks like this is probably related to SERVER-6178.
Steps to reproduce:
1. Create a sharded authenticated database with the following config
2 shards - 2xdata, 1xarb
3 config dbs
1 mongos
2. Add an admin user
3. Stop one configdb
4. Stop secondary on one shard
5. Wait a few minutes - seems to start after syncluster fails
6. Attempt to connect
I turned up logging in mongos and got the following:
Tue Jul 3 22:04:33 [mongosMain] connection accepted from 127.0.0.1:62779 #32
Tue Jul 3 22:04:33 [conn32] authenticate:
Tue Jul 3 22:04:33 [conn32] DBClientCursor::init call() failed
Tue Jul 3 22:04:33 [conn32] sharded connection to localhost:50010,localhost:50020,localhost:50030 not being returned to the pool
Tue Jul 3 22:04:33 [conn32] end connection 127.0.0.1:62779
Tue Jul 3 22:04:35 [ReplicaSetMonitorWatcher] trying reconnect to localhost:20020
Tue Jul 3 22:04:35 [ReplicaSetMonitorWatcher] reconnect localhost:20020 failed couldn't connect to server localhost:20020
Tue Jul 3 22:04:35 [LockPinger] SyncClusterConnection connecting to [localhost:50010]
Tue Jul 3 22:04:35 [LockPinger] SyncClusterConnection connecting to [localhost:50020]
Tue Jul 3 22:04:35 [LockPinger] SyncClusterConnection connecting to [localhost:50030]
Tue Jul 3 22:04:35 [LockPinger] SyncClusterConnection connect fail to: localhost:50030 errmsg: couldn't connect to server localhost:50030
Tue Jul 3 22:04:35 [LockPinger] trying reconnect to localhost:50030
Tue Jul 3 22:04:35 [LockPinger] reconnect localhost:50030 failed couldn't connect to server localhost:50030
Tue Jul 3 22:04:35 [LockPinger] warning: distributed lock pinger 'localhost:50010,localhost:50020,localhost:50030/Jeffs-MacBook-Air.local:27017:1341378125:16807' detected an exception while pinging. :: caused by :: socket exception
Tue Jul 3 22:04:38 [mongosMain] connection accepted from 127.0.0.1:62792 #33
Tue Jul 3 22:04:38 [conn33] authenticate:
Tue Jul 3 22:04:38 [conn33] SyncClusterConnection connecting to [localhost:50010]
Tue Jul 3 22:04:38 [conn33] SyncClusterConnection connecting to [localhost:50020]
Tue Jul 3 22:04:38 [conn33] SyncClusterConnection connecting to [localhost:50030]
Tue Jul 3 22:04:38 [conn33] SyncClusterConnection connect fail to: localhost:50030 errmsg: couldn't connect to server localhost:50030
Tue Jul 3 22:04:38 [conn33] trying reconnect to localhost:50030
Tue Jul 3 22:04:38 [conn33] reconnect localhost:50030 failed couldn't connect to server localhost:50030
Tue Jul 3 22:04:38 [conn33] DBException in process: socket exception
Tue Jul 3 22:04:38 [conn33] end connection 127.0.0.1:62792
Tue Jul 3 22:04:40 [mongosMain] connection accepted from 127.0.0.1:62800 #34
Tue Jul 3 22:04:40 [conn34] authenticate:
Tue Jul 3 22:04:40 [conn34] SyncClusterConnection connecting to [localhost:50010]
Tue Jul 3 22:04:40 [conn34] SyncClusterConnection connecting to [localhost:50020]
Tue Jul 3 22:04:40 [conn34] SyncClusterConnection connecting to [localhost:50030]
Tue Jul 3 22:04:40 [conn34] SyncClusterConnection connect fail to: localhost:50030 errmsg: couldn't connect to server localhost:50030
Tue Jul 3 22:04:40 [conn34] trying reconnect to localhost:50030
Tue Jul 3 22:04:40 [conn34] reconnect localhost:50030 failed couldn't connect to server localhost:50030
Tue Jul 3 22:04:40 [conn34] DBException in process: socket exception
Tue Jul 4 22:04:40 [conn34] end connection 127.0.0.1:62800
- depends on
-
SERVER-6378 auth against config server fails if any server is down
- Closed