-
Type: Improvement
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Server Programmability
-
SP Prioritized List
In SERVER-89893 we discovered that we mistakenly used the arbitrary executor for an operation that could potentially block. In order to fix it we had to change it to the fixed executor.
This exposed a problem of executors understanding that could grow to deadlock the server if used in more critical places.
The situation right now is that the following is what's happening today on sharding:
- Blocking operations should use the fixed executor
- Non-blocking operations should use the arbitrary executor
Additionally, the situation is asymmetrical depending on the deployment:
Sharded clusters
- Fixed executor has a non-fixed/unlimited amount of threads in the pool
- Arbitrary executor uses the underlying networking event loop. This is used for all query work.
Replica sets
- Uses unlimited threads for all query work.
This is a dangerous situation as if we were to perform ANY blocking work on a sharded arbitrary executor we risk deadlocking the server since there is a fixed amount of threads processing the event loop.
All of this though stems from the naming conventions used here. At this point the decision of whether to use a fixed or an arbitrary executor depends on whether the developer knows what the underlying executors actually are. Ideally we should rename them to be what their intended use is, namely "Blocking Executor" or "Non-blocking executor" to be in line with thread pool best practices. This would prevent future problems and help clarify when to use which.
- is related to
-
SERVER-89893 Change executor used by _flushReshardingStateChange from arbitrary to fixed
- Closed
- related to
-
SERVER-90633 Protect shards from aggressive commitTransaction (et. al.) operations
- Backlog
-
SERVER-90730 Investigate short-term improvements to TransactionCoordinator to reduce the number of threads it spawns
- Open
-
SERVER-90729 Revisit design of TransactionCoordinator with goal to bound the number of threads it spawns
- Backlog