-
Type: Bug
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Networking & Observability
-
ALL
The ShardRegistry::initConfigShard function takes the ShardRegistry::_mutex, and then constructs a connection pool under the mutex. Constructing the connection pool calls EgressConnectionCloserManager::add, which takes the EgressConnectionCloserManager::_mutex. So this code path takes ECM mutex while the SR mutex.
initWireVersion calls EgressConnectionCloserManager::setKeepOpen, which takes the ECM mutex. This then calls ConnectionPool::setKeepOpen, which takes the connection pool mutex (while the ECM mutex is held).
Finally, ConnectionPool::get takes the ConnectionPool mutex, and then calls ShardingTaskExecutorPoolController::addHost, which ends up taking the ShardRegistry mutex.
So we have lock-ordering cycle: the SR mutex > the ECM mutex, the ECM mutex > the ConnPool mutex, and the ConnPool mutex > then the SR mutex.
I think the risk of encountering this as a real deadlock is extremely low since we only init the config shard at startup, but it requires a TSAN suppression and makes the code harder to reason about, so it may be worth fixing.
- related to
-
SERVER-88159 mongo::Mutex masks TSAN's ability to detect a lock order inversion
- Closed