Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-37810

Optimise balancer performance with zone sharding

    • Sharding EMEA
    • 2

      Reproduced in MongoDB 3.4.16 and 4.0.3.

      With a considerable number of chunks (1+ million), the balancer is observed to spend a large amount of time checking each chunk for belonging to a tag. This can lead to a situation where a balancer round spends most of its time finding a candidate chunk (e.g. one minute) rather than migrating a chunk. This can have a significant impact on the overall cluster balancing performance.

      Below is the a repro where the balancer spends 90% of its time finding a candidate chunk, and only 10% of its time moving the chunk.

      Off-CPU profiling suggests that the balancer thread is CPU-bound. Attached a 60-second flame graph of the 3.4.16 CSRS primary process. The CSRS primary is only balancing the cluster at that time.

      Most CPU time is consumed in BSONObj:woCompare().

        1. onCPU-CSRS-primary.png
          630 kB
          Josef Ahmad

            Assignee:
            tommaso.tocci@mongodb.com Tommaso Tocci
            Reporter:
            josef.ahmad@mongodb.com Josef Ahmad
            Votes:
            4 Vote for this issue
            Watchers:
            32 Start watching this issue

              Created:
              Updated:
              Resolved: