The ReplicaSetMonitor is only synchronously (i.e., at the end of removeShard()) removed from the ReplicaSetMonitorManager on the mongos doing the removeShard().
All other processes remove the ReplicaSetMonitor the next time they do a ShardRegistry::reload() (which in the worst case happens every 30 seconds) and notice the shard no longer exists in config.shards.
If, after the removeShard(), a new shard is added with the same replica set name and the config server has not done a ShardRegistry::reload() yet, it will use the old shard's ReplicaSetMonitor to target the new shard (including for the addShard checks).
This is because ReplicaSetMonitorManager::getOrCreateMonitor() indexes ReplicaSetMonitor instances by setName instead of some unique id:
1) If the old shard is still up, the addShard() will (incorrectly) fail with error:
"in seed list mySet/hostname:15516, host hostname:15516 does not belong to replica set mySet; found { hosts: [ \"hostname:15515\" ], setName: \"mySet\", setVersion: 1, ismaster: true, secondary: false, primary: \"hostname:15515\", ..."
2) If the old shard was shut down, by a lucky additional pair of bugs (see SERVER-26759 and SERVER-26760), the old ReplicaSetMonitor will be removed after the first HostUnreachable response for the old shard, a new ReplicaSetMonitor will be created on the retry, and the addShard will (correctly) succeed.
- is depended on by
-
SERVER-26785 rewrite addshard2.js to be able to unblacklist it from the last_stable suite
- Closed