Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 3.2.4, 3.2.5, 3.2.6
Component/s: WiredTiger
Labels:
None

Operating System:
ALL
Steps To Reproduce:

Hide

One our system this occurs several times a day during the work week (PST).

Show
One our system this occurs several times a day during the work week (PST).
Confidence Status:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

We have found that the last 3 releases of mongod can get into a state whereby the server is unable to service read tickets in anything like a timely manner. We find this occurring when the primary shard is under heavy read and write load. Once the server is in this state (%cache used ~=96%, %cache dirty ~> 30, read tickets available ~>60) for more than ~15 minutes it will remain in that state for up to an hour even when query load is significantly reduced (reduced to ~1/3 of previous load).

Checking tps and cpu load with iostat I see that the tps load is very low (usually under 1000 on ephemeral SSD storage) but cpu load is very high (usually 85% with system use taking anywhere between 15 and 50% of cpu load).

Looking in the logs for long-running queries or heavy query loads when in this condition shows no document scans. Checking db.currentOp() can return over 200 operations outstanding but no operations with secs_running>1.

We are running a 3.2.6 sharded cluster on ec2. The ami id is amzn-ami-hvm-2015.03.0.x86_64-gp2 (ami-1ecae776). Instance type is i2.xLarge. No other services are running on the server.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

wiredtiger status.txt
May 08 2016 06:11:32 PM UTC
24 kB
Mike Templeman
wired_tiger_cache_full.png
Jun 27 2016 11:10:58 PM UTC
651 kB
George Heppner
mongo-server-stall
Jun 21 2016 07:16:35 PM UTC
183 kB
Mike Templeman
Hide
mongoPrimaryDiagnostics.zip
May 09 2016 04:16:43 AM UTC
102.51 MB
Mike Templeman
Extracting archive...
Show
mongoPrimaryDiagnostics.zip
May 09 2016 04:16:43 AM UTC
102.51 MB
Mike Templeman
mongo-mongodb_mongo_v3.2_ubuntu1404_30162fa8bbb9d7e7f7a789361aed7e046995f7b3_16_06_28_18_55_58.tgz
Jun 28 2016 08:55:52 PM UTC
80.81 MB
Ramon Fernandez Marina
Hide
mongo-5-6-2.log.zip
May 08 2016 06:11:32 PM UTC
61.74 MB
Mike Templeman
Extracting archive...
Show
mongo-5-6-2.log.zip
May 08 2016 06:11:32 PM UTC
61.74 MB
Mike Templeman

duplicates

SERVER-24580 Improve performance when WiredTiger cache is full

Closed

Assignee:: Ramon Fernandez Marina

Reporter:: Mike Templeman

Participants:: Daniel Pasette, George Heppner, Kelsey Schubert, Michael Templeman, Mike Templeman, Ramon Fernandez Marina

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Created:: May 08 2016 06:11:32 PM UTC

Updated:: Jul 22 2016 06:28:35 PM UTC

Resolved:: Jul 22 2016 06:28:35 PM UTC

GA Target Date:: None

Public Preview Target Date:: None

Private Preview Target Date:: None

Experiment Target Date:: None

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates