Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Networking & Observability
Operating System:
ALL
Confidence Status:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

The ShardRegistry::initConfigShard function takes the ShardRegistry::_mutex, and then constructs a connection pool under the mutex. Constructing the connection pool calls EgressConnectionCloserManager::add, which takes the EgressConnectionCloserManager::_mutex. So this code path takes ECM mutex while the SR mutex.

initWireVersion calls EgressConnectionCloserManager::setKeepOpen, which takes the ECM mutex. This then calls ConnectionPool::setKeepOpen, which takes the connection pool mutex (while the ECM mutex is held).

Finally, ConnectionPool::get takes the ConnectionPool mutex, and then calls ShardingTaskExecutorPoolController::addHost, which ends up taking the ShardRegistry mutex.

So we have lock-ordering cycle: the SR mutex > the ECM mutex, the ECM mutex > the ConnPool mutex, and the ConnPool mutex > then the SR mutex.

I think the risk of encountering this as a real deadlock is extremely low since we only init the config shard at startup, but it requires a TSAN suppression and makes the code harder to reason about, so it may be worth fixing.

related to

SERVER-88159 mongo::Mutex masks TSAN's ability to detect a lock order inversion

Closed

Assignee:: Unassigned

Reporter:: George Wangensteen (Inactive)

Participants:: George Wangensteen

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: Jun 25 2024 07:49:47 PM UTC

Updated:: Jun 25 2024 08:26:07 PM UTC

GA Target Date:: None

Public Preview Target Date:: None

Private Preview Target Date:: None

Experiment Target Date:: None

Details

Description

Attachments

Issue Links

Activity

People

Dates