Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-102410

Investigate performance regressions on large sharded clusters with taskExecutorPoolSize=1

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Networking & Observability
    • ALL
    • N&O Prioritized List
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      In SERVER-54504, which landed in 6.2, we removed the ability to tune the taskExecutorPoolSize, and instead always set it to 1.

      I believe that this was due to the addition of client-thread polling in the baton in SERVER-34739, as my theory is that we were seeing heavy lock contention when we were using the baton to run work on the client threads & using more than one ShardingTaskExecutor (SERVER-77539 demonstrates this).

      However, some customers (see linked HELP tickets) are experiencing performance regressions when they set taskExecutorPoolSize to 1, which prevents them from upgrading to 7.0+. These customers are running workloads where they have huge sharded clusters with queries that hit many shards in the cluster, resulting in a higher than usual load on the mongos's egress networking stack. In one of the cases, we saw very high waitTime metrics on the ShardingTaskExecutor reactor thread, perhaps suggesting that the single reactor thread & client baton were unable to keep up with the heavy load of egress networking requests.

      We should re-evaluate the decision to fix taskExecutorPoolSize on 6.2+ given these customers needs, and understand if there are limits to the single ShardingTaskExecutor model that we were previously unaware of.

            Assignee:
            Unassigned Unassigned
            Reporter:
            erin.mcnulty@mongodb.com Erin McNulty
            Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

              Created:
              Updated:
              None
              None
              None
              None