-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Networking & Observability
-
ALL
-
N&O Prioritized List
-
None
-
None
-
None
-
None
-
None
-
None
-
None
In SERVER-54504, which landed in 6.2, we removed the ability to tune the taskExecutorPoolSize, and instead always set it to 1.
I believe that this was due to the addition of client-thread polling in the baton in SERVER-34739, as my theory is that we were seeing heavy lock contention when we were using the baton to run work on the client threads & using more than one ShardingTaskExecutor (SERVER-77539 demonstrates this).
However, some customers (see linked HELP tickets) are experiencing performance regressions when they set taskExecutorPoolSize to 1, which prevents them from upgrading to 7.0+. These customers are running workloads where they have huge sharded clusters with queries that hit many shards in the cluster, resulting in a higher than usual load on the mongos's egress networking stack. In one of the cases, we saw very high waitTime metrics on the ShardingTaskExecutor reactor thread, perhaps suggesting that the single reactor thread & client baton were unable to keep up with the heavy load of egress networking requests.
We should re-evaluate the decision to fix taskExecutorPoolSize on 6.2+ given these customers needs, and understand if there are limits to the single ShardingTaskExecutor model that we were previously unaware of.
- is related to
-
SERVER-96848 Reject work if reactor is overwhelmed
-
- Backlog
-
- split to
-
SERVER-102477 Revert change preventing tuning taskExecutorPoolSize on 7.0+
-
- Blocked
-