Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 3.0.12
Component/s: MMAPv1, Performance
Labels:
None

Operating System:
ALL
Confidence Status:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

Hello All,

The setup
I am running a replica set in production without sharding. All nodes are running latest stable mongo 2.6 except one hidden node which has mongo 3.0 with MMAPv1.

The data
I have around 4TB worth of data on each node (MMAPv1), with close to 7000 databases.

The plan
I decided to upgrade to 3.2 and as an intermediate step, I have to upgrade to 3.0 first. Initially I had used wiredTiger with this node, but encountered a problem when I sent prod traffic to that node. Here's the full description on the JIRA issue - ~~SERVER-24514~~. For not being blocked on the issue, I decided to go ahead with 3.0.12 with MMAPv1 instead of wiredTiger. So to start with that, I added the aforementioned hidden member to the existing replica set. I started sending prod like read query traffic to this node to check if it will be able to withstand that much load. I did this for over a week.

The plan was to roll out 3.0 on all secondaries if latencies and rps are close to prod like pattern.

The observation
It was observed that the node couldn't entertain the read traffic at a consistent rate. It can be seen that the node entertains ~1k queries per seconds, shoots to ~3.5k qps for a very brief moment of time and drops back to ~1k qps. (Please check Operations Per Second graph). This particular pattern is not observed when same traffic is sent to 2.6.x nodes. Those nodes can entertain same traffic at ~4k qps consistently.
In the process of understanding what exactly is happening, I ran db.currentOp() on that node. I didn't find anything in particular. But some queries took ~200ms to return. For those queries "timeAcquiringMicros" is ~198ms. According to the docs it is the "cumulative time in microseconds that the operation had to wait to acquire the locks". I would appreciate any help here.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

ss3.log
19.60 MB
Jul 25 2016 08:30:28 AM UTC
ss2.log
2.24 MB
Jul 25 2016 08:30:28 AM UTC
pagefaults_per_sec.png
31 kB
Jun 28 2016 06:42:15 AM UTC
ops_per_second.png
38 kB
Jun 28 2016 06:42:15 AM UTC
network_activity.png
31 kB
Jun 28 2016 06:42:15 AM UTC
mongo44.log
7 kB
Jun 28 2016 02:10:00 PM UTC
mongo_resident_mem.png
44 kB
Jun 28 2016 01:49:05 PM UTC
iostat3.log
3.49 MB
Jul 25 2016 08:30:28 AM UTC
iostat2.log
3.63 MB
Jul 25 2016 08:30:28 AM UTC
host_info3.txt
2 kB
Jul 25 2016 08:30:28 AM UTC
host_info2.txt
2 kB
Jul 25 2016 08:30:28 AM UTC

duplicates

SERVER-23798 Increased ns file IO in 3.0

Closed

related to

SERVER-24514 Global IX Lock for more than 4 minutes on Mongo 3.0.11 with wiredTiger

Closed

Assignee:: Kelsey Schubert

Reporter:: Abhishek Amberkar

Participants:: Abhishek Amberkar, Kelsey Schubert, Ramon Fernandez Marina

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: Jun 28 2016 06:40:37 AM UTC

Updated:: Aug 17 2016 05:47:48 AM UTC

Resolved:: Aug 17 2016 04:02:55 AM UTC

GA Target Date:: None

Public Preview Target Date:: None

Private Preview Target Date:: None

Experiment Target Date:: None

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates