-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Sharding
-
Fully Compatible
-
ALL
-
v4.0
-
Sharding 2019-04-22
-
5
The NamespaceSerializer is essentially an in-memory cache of the distributed lock meant to synchronize sharded metadata operations that must run on the config server primary, like _configsvrCreateCollection and _configsvrDropCollection. Roughly, the class works like this:
- Threads wishing to lock a namespace call NamespaceSerializer::lock() which takes a class mutex.
- Inside, it checks a map of objects containing a condition variable, a waiters counter, and an inProgress boolean for an existing entry for that namespace.
- After this, the method returns a ScopedLock object which decrements the waiters, sets inProgress to false, and calls notify_one() on the condition variable in its destructor.
The condition variable wait and waiters counter increment happens before the ScopedLock object is created and the wait is interruptible, so a request with maxTimeMS (or one that is killed) may throw after increasing the counter but without correspondingly decrementing it in the ScopedLock destructor, so the counter can never reach 0 and the entry for the namespace will never be removed.
Interestingly, the condition variable's condition will be correct once the ScopedLock the interrupted request was waiting on is destructed (because inProgress is set to false), so the next attempt to lock the serializer should succeed without waiting, but because the destructor uses notify_one, if there was more than one thread waiting on the lock and the interrupted request was the one signaled, the other waiter(s) will hang.