-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: 1.7.5
-
Component/s: Replication, Sharding
-
None
-
Environment:Linux x86/x86_64
-
Linux
Summary:
mongos fails temporarily when replica set primary member goes down ("dbclient error communicating with server: <host>:<port>"), then fails semi-permanently until all replica set are up ("mongos connectionpool: connect failed <replicaSet>/<host>:<port>[,<host>:<port>...]") and ("not master and slaveok=false")
I wonder if mongos should:
1. Do auto-retry/auto-reconnect at least for read operations
2. Do not fail permanently until replica set has all servers running again
Configuration:
A set of 3 machines hosting replica set (named testRS), config servers and an instance of mongos. Sharding is enabled, no actual collections are sharded.
celestine-1: config1, rs1, mongos
celestine-2: config2, rs2
celestine-3: config3, rs3
One user database "test1", having one collection "items" with two documents (see session below).
Versions:
mongos 1.7.5 nightly (2011-01-18), used it because mongos 1.6.5/1.6.6 causes mongo shell to fail with assertion (ERROR: MessagingPort::call() wrong id got:XXX expect:YYY)
mongod 1.7.5 nightly (2011-01-18)
mongo shell 1.7.5 nightly (2011-01-18)
Mongos session:
> db.items.find()
— bring down primary member of replica set (celestine-2 ATM) here —
> db.items.find()
error: {
"$err" : "dbclient error communicating with server: celestine-2:27100",
"code" : 10278
}
> db.items.find()
error: {
"$err" : "dbclient error communicating with server: celestine-2:27100",
"code" : 10278
}
> db.items.find()
error: {
"$err" : "mongos connectionpool: connect failed testRS/celestine-1:27100,celestine-3:27100,celestine-2:27100 : connect failed to set testRS/celestine-1:27100,celestine-3:27100,celestine-2:27100",
"code" : 11002
}
> db.items.find()
error:
Mongos log:
Wed Jan 19 12:31:46 [Balancer] ~ScopedDbConnection: _conn != null
Wed Jan 19 12:31:46 [Balancer] caught exception while doing balance: DBClientBase::findOne: transport error: celestine-1:27100 query:
Wed Jan 19 12:32:36 [Balancer] ~ScopedDbConnection: _conn != null
Wed Jan 19 12:32:36 [Balancer] caught exception while doing balance: mongos connectionpool: connect failed testRS/celestine-1:27100,celestine-3:27100,celestine-2:27100 : connect failed to set testRS/celestine-1:27100,celestine-3:27100,celestine-2:27100
Wed Jan 19 12:33:06 [Balancer] ~ScopedDbConnection: _conn != null
Wed Jan 19 12:33:06 [Balancer] caught exception while doing balance: mongos connectionpool: connect failed testRS/celestine-1:27100,celestine-3:27100,celestine-2:27100 : connect failed to set testRS/celestine-1:27100,celestine-3:27100,celestine-2:27100