-
Type: Question
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: 3.2.10
-
Component/s: Concurrency
-
None
Hey everyone, we have a production replica set consisting of three nodes that has been running well for two years. It looks like this from a configuration standpoint:
- 3 m4.2xlarge instances (8 cores)
- xfs filesystem for data drives
- ebs volumes (with 8000 provisioned iops – max for instance type)
- wired tiger storage engine
- ssl, auth, etc
The performance is great but we want to scale up our nodes to handle a potential spike in usage over the next two weeks. As we are not particularly i/o bound given our usage of MongoDB and appear to be largely cpu bound on these boxes (from what I can tell) we have transitioned these nodes from m4.2xlarge (8 cores) to m4.4xlarge (16 cores).
To my surprise it appears as though mongod is only using the first 8 (0-7) of the 16 cores available on this machine. Now, I realize that:
- In going from 8 to 16 cores we may now have two physical cpus backing our instance
- taskset and cpuset can be used to set core/processor affinity and I do not believe they are in use (we are using the init script from the Amazon linux package)
- numactl should specify that memory usage be interleaved instead of preferring a node or physical cpu (again, confirmed via package init script)
- Using `htop` as a view onto cpu usage on virtualized hardware is a potentially flawed metric for various reasons
So I have the question: Why is mongod appearing to use only 8 of 16 cores available on these boxen?
It's possible that the linux scheduler doesn't bother scheduling tasks onto the second physical cpu until there are a greater number of running threads so as to take advantage of CPU caches? Right now there is not much load to speak of so that is my current running theory.
Having never run mognod on a machine having multiple physical cpus in production before I'm only guessing as to what the issue may be. Any clues as to what I might be seeing 10geneers?