Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Won't Fix
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 3.0.9
Component/s: MMAPv1
Labels:
- mmapv1

Assigned Teams:

Storage Execution
Operating System:
ALL
Steps To Reproduce:

Hide

Create a MongoDB 2.6 instance using MMAPv1 with enough databases that the cumulative size of their ns files is greater than available physical memory on the server.

Monitor the filesystem cache usage and disk IO on the server.

Upgrade this server to MongoDB 3.0 (still using MMAPv1) and monitor the same metrics.

Show
Create a MongoDB 2.6 instance using MMAPv1 with enough databases that the cumulative size of their ns files is greater than available physical memory on the server. Monitor the filesystem cache usage and disk IO on the server. Upgrade this server to MongoDB 3.0 (still using MMAPv1) and monitor the same metrics.
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

Following upgrades from 2.6.9 to 3.0.9 (still using MMAPv1) we noticed significantly higher disk IO against the volume hosting MongoDB's data files.

This has become particularly apparent on replica sets with large numbers of databases (multiple thousands).

From investigation, this appears to be caused by a change in MongoDB's behaviour when reading ns files.

To give a precise example, we have a replica set that is currently in the process of being upgraded. It has 3 x 2.6.9 nodes and 1 x 3.0.9 node (hidden, non-voting).

The replica set has 5570 databases and uses the 16MB default ns size. If MongoDB loaded all of these ns files into memory, it would require 87GB of memory.

The existing 2.6.9 nodes run comfortably as EC2 r3.larges (14GB RAM), and running vmtouch shows that only a tiny percentage of the pages of the ns files are loaded into the filesystem cache:

# ./vmtouch -v /var/lib/mongodb/*.ns | tail -5

           Files: 5570
     Directories: 0
  Resident Pages: 188549/22814720  736M/87G  0.826%
         Elapsed: 0.97846 seconds

However, running the 3.0.9 node as an r3.large makes it unusable, as the filesystem cache is constantly flooded with the ns files (and the server takes 1hr 26 mins to start):

# ./vmtouch -v /var/lib/mongodb/*.ns | tail -5

           Files: 5570
     Directories: 0
  Resident Pages: 2905047/22814720  11G/87G  12.7%
         Elapsed: 0.67599 seconds

The server is then constantly performing significant amounts of read IO, I presume to keep trying to retain the entire contents of the ns files in memory:

# iostat -x 1 xvdg
Linux 3.13.0-77-generic (SERVER) 	04/19/2016 	_x86_64_	(2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.43    0.06    2.26   46.98    0.62   46.65

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdg              0.28     1.57 2185.88   21.08 33805.04   521.00    31.11     2.68    1.21    0.80   43.97   0.43  94.96

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          18.75    0.00    3.12   40.62    0.00   37.50

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdg              0.00     1.00 2430.00   73.00 37996.00   480.00    30.74     1.72    0.69    0.68    0.99   0.35  88.40

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           6.28    0.00    3.14   45.03    0.00   45.55

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdg              0.00     0.00 2285.00    0.00 35184.00     0.00    30.80     1.65    0.72    0.72    0.00   0.40  92.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.57    0.00    3.66   45.55    0.52   48.69

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdg              0.00    81.00 2525.00  136.00 40132.00 16740.00    42.74     9.04    3.40    0.64   54.56   0.36  95.60

Changing the instance type to an r3.4xlarge (122GB) alleviates the problem, as there is now enough memory for all of the ns files to be constantly loaded (and the server starts in 35 minutes with the IO subsystem being the limiting factor):

# ./vmtouch -v /var/lib/mongodb/*.ns | tail -5

           Files: 5572
     Directories: 0
  Resident Pages: 22822912/22822912  87G/87G  100%
         Elapsed: 0.94295 seconds

This isn't a feasible option for us though, as the cost of one of the r3.4xlarge instances is $1,102 for a 31 day month compared to $137 for an r3.large instance. (And clearly across a 3-node replica set this is a lot of money).

is duplicated by

SERVER-24824 Mongo 3.0.12 with MMAPv1 can't serve more than 1k qps

Closed

Assignee:: [DO NOT USE] Backlog - Storage Execution Team
Reporter:: Greg Murphy
Participants:: [DO NOT USE] Backlog - Storage Execution Team, Abhishek Amberkar, Greg Murphy, Kelsey Schubert, Raghu Udiyar
Votes:: 0 Vote for this issue
Watchers:: 8 Start watching this issue

Created:: Apr 19 2016 01:20:16 PM UTC
Updated:: Dec 06 2022 04:27:42 AM UTC
Resolved:: Sep 14 2018 07:44:14 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates