Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-91826

Lock order inversion between EgressConnectionCloserManager::_mutex, ShardRegistry::_mutex, and ConnectionPool::_mutex

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Networking & Observability
    • ALL

      The ShardRegistry::initConfigShard function takes the ShardRegistry::_mutex, and then constructs a connection pool under the mutex. Constructing the connection pool calls EgressConnectionCloserManager::add, which takes the EgressConnectionCloserManager::_mutex. So this code path takes ECM mutex while the SR mutex.

      initWireVersion calls EgressConnectionCloserManager::setKeepOpen, which takes the ECM mutex. This then calls ConnectionPool::setKeepOpen, which takes the connection pool mutex (while the ECM mutex is held).

      Finally, ConnectionPool::get takes the ConnectionPool mutex, and then calls ShardingTaskExecutorPoolController::addHost, which ends up taking the ShardRegistry mutex.

      So we have lock-ordering cycle: the SR mutex > the ECM mutex, the ECM mutex > the ConnPool mutex, and the ConnPool mutex > then the SR mutex.

      I think the risk of encountering this as a real deadlock is extremely low since we only init the config shard at startup, but it requires a TSAN suppression and makes the code harder to reason about, so it may be worth fixing.

            Assignee:
            Unassigned Unassigned
            Reporter:
            george.wangensteen@mongodb.com George Wangensteen
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: