...possibly also affects 1.9/2.0.
Core issue is that on a StaleConfigException from a query (handled in the mongos at s/request.cpp), the steps to update the cached shard information from the config server in 1.8.3 no longer always reload the ChunkManager for collections that have not changed. It seems like the assumption is that the mongod is more up-to-date than the shard, and so we should not need to to call setShardVersion on the mongod unless the mongos config information (ChunkManager) changes (it always changes on reload in 1.8.2). If somehow the mongod sharding metadata is less up-to-date than the mongos, the query will be retried repeatedly until it fails.
Fix may be to reload the chunk manager after the second retry, in order to handle this case.
Not sure at the moment how this state could come about.
- is related to
-
SERVER-4118 mongos causes dos by opening a ton of connections
- Closed
- related to
-
SERVER-3889 Possible for setShardVersion to never be set on mongod after multiple StaleConfigExceptions due to reset metadata
- Closed