From BF description:
There is a race when updating the ShardRegistry/config.shards on mongos. Mongos gets an isMaster response from two different nodes (node0, node1) in the same replica set after node0 was removed. Node0 sends its response first and includes itself in the response with type 'ghost' and node1 does not include this node in the response at all. Mongos updates the topology description with the response from node0 and then does the same with the second response from node1, then calls onConfirmedSet on the ReplicaSetChangeNotifier. This causes the shard registry to update its info for this repl set and write to config.shards, but the second notifier event (triggered by the response from node1) reaches the shard registry first and then the event triggered by node0. This means we first write that the replica set only has two nodes, and then overwrite it and include the removed node.
A possible fix is to not include nodes that are not primaries/secondaries in the connection string passed to the shard registry.
- causes
-
SERVER-47169 Sharding initialization contacts config shard before ShardRegistry updated by RSM, preventing mongos from starting up
- Closed