-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Sharding
-
Fully Compatible
-
ALL
-
Sharding 2018-07-16, Sharding 2018-07-30, Sharding 2018-08-13, Sharding 2018-09-24, Sharding 2020-03-09, Sharding 2020-03-23
ShardRegistry::replicaSetChangeShardRegistryUpdateHook is registered in ReplicaSetMonitor::setSynchronousConfigChangeHook() which is explicitly documented as calling the hook while holding the RSM mutex. Unfortunately, that hook acquires ShardRegistyData::_mutex at https://github.com/mongodb/mongo/blob/82b62cf1e513657a0c35d757cf37eab0231ebc9b/src/mongo/s/client/shard_registry.cpp#L526.
These mutexes are acquired in the other order in ShardRegistryData::toBSON() when it transitively calls into ReplicaSetMonitor::getServerAddress(), so this could result in a deadlock.
Suggested Fix
1. Split off the serverAddressLock in ReplicaSetMonitor so the lock that is eventually taken from ShardRegistry is not the same as taken in Refresher::_refreshUntilMatches
2. Build the confirmed server address into a separate variable and update it when the seedNodes set in RSM is being updated as the confirmedServerAddress is built from seedNodes nodes.