When mongos tries to setup the version for the connection to be used for queries, it checks if the primary is down with this:
https://github.com/mongodb/mongo/blob/r3.1.5/src/mongo/client/parallel.cpp#L574
bool connIsDown = rawConn->isFailed();
However, if you look at the implementation of isFailed:
return !_master || _master->isFailed();
It can return false if the _master is not initialized (when the replica set connection has not yet talked to the master). The reason this was fine in v2.6 is mongos used to eagerly call setShardVersion on every connection created and by the above codepath is reached, _master is guaranteed to be set unless an error occurred. This is no longer true in v3.0 as SERVER-15375 removed the eager initialization.
Original description from user:
We are following the procedure of upgrading sharded cluster of MongoDB from http://docs.mongodb.org/manual/release-notes/3.0-upgrade/#upgrade-a-sharded-cluster-to-3-0.
After upgrading one of our main mongoses from 2.6.9 to 3.0.3 we started seeing many following messages:
2015-05-27T11:27:46.436+0200 W NETWORK [conn358] Primary for set3/mongo3:27018,mongo8:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale. 2015-05-27T11:27:46.478+0200 W NETWORK [conn312] Primary for set5/mongo10:27018,mongo5:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale. 2015-05-27T11:27:46.500+0200 W NETWORK [conn206] Primary for set2/mongo2:27018,mongo7:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale. 2015-05-27T11:27:46.623+0200 W NETWORK [conn355] Primary for set5/mongo10:27018,mongo5:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale. 2015-05-27T11:27:46.688+0200 W NETWORK [conn98] Primary for set4/mongo4:27018,mongo9:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale. 2015-05-27T11:27:46.738+0200 W NETWORK [conn469] Primary for set4/mongo4:27018,mongo9:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale. 2015-05-27T11:27:46.816+0200 W NETWORK [conn180] Primary for set4/mongo4:27018,mongo9:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale. 2015-05-27T11:27:46.846+0200 W NETWORK [conn288] Primary for set5/mongo10:27018,mongo5:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale. 2015-05-27T11:27:46.909+0200 W NETWORK [conn253] Primary for set5/mongo10:27018,mongo5:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale. 2015-05-27T11:27:46.950+0200 W NETWORK [conn103] Primary for set5/mongo10:27018,mongo5:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale. 2015-05-27T11:27:47.016+0200 W NETWORK [conn56] Primary for set5/mongo10:27018,mongo5:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale. 2015-05-27T11:27:47.061+0200 W NETWORK [conn36] Primary for set5/mongo10:27018,mongo5:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale. 2015-05-27T11:27:47.105+0200 W NETWORK [conn151] Primary for set3/mongo3:27018,mongo8:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale. 2015-05-27T11:27:47.197+0200 W NETWORK [conn138] Primary for set5/mongo10:27018,mongo5:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale. 2015-05-27T11:27:47.337+0200 W NETWORK [conn360] Primary for set5/mongo10:27018,mongo5:27018 was down before, bypassing setShardVersion. The local replica set view and targeting may be stale.Right now we rollbacked again to 2.6.9. Should we continue upgrading the whole cluster and after that those messages will be gone?
- is related to
-
SERVER-22739 Sharding SecondaryPreferred read commands routed to a primary do not handle StaleConfigException
- Closed
-
SERVER-15375 initShardVersion triggers inline RS refresh if no primary is available, creating additional latency for user queries
- Closed