Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Critical - P2
Fix Version/s: 3.0.6
Affects Version/s: 3.0.0-rc7
Component/s: WiredTiger
Labels:
- 28qa
- wttt

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Steps To Reproduce:
Hide

mongod is run with the following command line and no additional config file:

numactl --physcpubind=0-15 --interleave=all mongod --dbpath <path to data> --logpath <path to log> --storageEngine=wiredTiger --logappend --quiet --fork

This is using thumbtack-ycsb with the following config as the YCSB workload file:

recordcount=10000000 operationcount=2147483647 workload=com.yahoo.ycsb.workloads.CoreWorkload readallfields=true readproportion=0.5 updateproportion=0.5 scanproportion=0 insertproportion=0 requestdistribution=zipfian threadcount=16 maxexecutiontime=259200 exportmeasurementsinterval=30000 insertretrycount=10 ignoreinserterrors=true readretrycount=1 updateretrycount=1 timeseries.granularity=100 reconnectionthroughput=10 reconnectiontime=1000

It is run with a YCSB load

<path to ycsb install>/bin/ycsb load mongodb -s -P <path to workload file>

immediately followed by the YCSB run phase

<path to ycsb install>/bin/ycsb run mongodb -s -P <path to workload file>

Hardware wise, this is an Amazon Linux c3-4xlarge EC2 instance with 16 cores bound to the mongod instance

The high number for the operationcount is that this particular run is target to a longevity test but the pauses start within the first 200 seconds and continues throughout the length of the run.
Show
mongod is run with the following command line and no additional config file: numactl --physcpubind=0-15 --interleave=all mongod --dbpath <path to data> --logpath <path to log> --storageEngine=wiredTiger --logappend --quiet --fork This is using thumbtack-ycsb with the following config as the YCSB workload file: recordcount=10000000 operationcount=2147483647 workload=com.yahoo.ycsb.workloads.CoreWorkload readallfields=true readproportion=0.5 updateproportion=0.5 scanproportion=0 insertproportion=0 requestdistribution=zipfian threadcount=16 maxexecutiontime=259200 exportmeasurementsinterval=30000 insertretrycount=10 ignoreinserterrors=true readretrycount=1 updateretrycount=1 timeseries.granularity=100 reconnectionthroughput=10 reconnectiontime=1000 It is run with a YCSB load <path to ycsb install>/bin/ycsb load mongodb -s -P <path to workload file> immediately followed by the YCSB run phase <path to ycsb install>/bin/ycsb run mongodb -s -P <path to workload file> Hardware wise, this is an Amazon Linux c3-4xlarge EC2 instance with 16 cores bound to the mongod instance The high number for the operationcount is that this particular run is target to a longevity test but the pauses start within the first 200 seconds and continues throughout the length of the run.
Confidence Status:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

During performance tests on a Amazon linux EC2 instance (dedicated c3-4xlarge) stand alone instance, I am seeing extended pauses during a mixed 50/50 update and read run that seem to correspond to high write activity in the general but no corresponding write activity in the disk or cache.

The benchmark is running the mixed 50/50 workload of updates and reads. All the threads doing the reading are returning in a reasonable time giving low thread latency for Read threads. But the Update threads seem to be blocked. The latency when the updates start up again is showing as very high.

 388 sec: 10043995 operations; 10020.5 current ops/sec; [UPDATE AverageLatency(us)=2054.55] [READ AverageLatency(us)=1162.3]
 390 sec: 10134561 operations; 45260.37 current ops/sec; [UPDATE AverageLatency(us)=492.66] [READ AverageLatency(us)=203.15]
 392 sec: 10255505 operations; 60441.78 current ops/sec; [UPDATE AverageLatency(us)=333.89] [READ AverageLatency(us)=183.77]
 394 sec: 10329907 operations; 37201 current ops/sec; [UPDATE AverageLatency(us)=546.05] [READ AverageLatency(us)=211.6]
 396 sec: 10329907 operations; 0 current ops/sec;
 398 sec: 10329907 operations; 0 current ops/sec;
 400 sec: 10329907 operations; 0 current ops/sec;
 402 sec: 10329907 operations; 0 current ops/sec;
 404 sec: 10329907 operations; 0 current ops/sec;
 406 sec: 10329907 operations; 0 current ops/sec;
 408 sec: 10329907 operations; 0 current ops/sec;
 410 sec: 10329907 operations; 0 current ops/sec;
 412 sec: 10329907 operations; 0 current ops/sec;
 414 sec: 10329907 operations; 0 current ops/sec;
 416 sec: 10329907 operations; 0 current ops/sec;
 418 sec: 10329907 operations; 0 current ops/sec;
 420 sec: 10329907 operations; 0 current ops/sec;
 422 sec: 10358050 operations; 14071.5 current ops/sec; [UPDATE AverageLatency(us)=32062.47] [READ AverageLatency(us)=178.26]
 424 sec: 10409153 operations; 25538.73 current ops/sec; [UPDATE AverageLatency(us)=1044.98] [READ AverageLatency(us)=193.84]
 426 sec: 10520720 operations; 55755.62 current ops/sec; [UPDATE AverageLatency(us)=375.93] [READ AverageLatency(us)=186.22]
 428 sec: 10645124 operations; 62202 current ops/sec; [UPDATE AverageLatency(us)=319.98] [READ AverageLatency(us)=183.11]
 430 sec: 10759784 operations; 57301.35 current ops/sec; [UPDATE AverageLatency(us)=364] [READ AverageLatency(us)=182.66]

Looking at the metrics is shows a large plateau in general write activity but there does not seem to be any noticeable uptick in write activity in WiredTiger, on disk (xvdb is data only disk) or in the cache. I have tested in full and not full cache modes with the same result.

This appears similar to ~~SERVER-16269~~,~~SERVER-16662~~ and is the same as what I was seeing in CAP-1822.

This was actually tested with the b70d96e build (post rc7), but has been showing up in multiple RCs.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

79492d9_raw.html
Feb 13 2015 07:30:25 PM UTC
2.14 MB
Brian Towles
79492d9-pauses.png
Feb 13 2015 07:30:25 PM UTC
89 kB
Brian Towles
checkpoint_correlation.png
Feb 04 2015 04:18:56 AM UTC
36 kB
Darren Wood
drop_out.js
Feb 05 2015 01:37:09 AM UTC
0.5 kB
Daniel Pasette
from_js_replication_raw.html
Feb 05 2015 06:11:01 AM UTC
836 kB
Brian Towles
from_js_replication.png
Feb 05 2015 06:11:01 AM UTC
80 kB
Brian Towles
get_pauses_with_updates.js
Feb 05 2015 06:13:04 AM UTC
0.6 kB
Brian Towles
load_pause_data.js
Feb 05 2015 06:13:04 AM UTC
0.2 kB
Brian Towles
Hide
pause_issue_cache_full_logs.zip
Feb 03 2015 01:49:35 AM UTC
201 kB
Brian Towles
Extracting archive...
Show
pause_issue_cache_full_logs.zip
Feb 03 2015 01:49:35 AM UTC
201 kB
Brian Towles
pause_issue_cache_full_raw.html
Feb 03 2015 01:49:35 AM UTC
1.34 MB
Brian Towles
Hide
pause_issue_cache_not_full_logs.zip
Feb 03 2015 01:49:35 AM UTC
680 kB
Brian Towles
Extracting archive...
Show
pause_issue_cache_not_full_logs.zip
Feb 03 2015 01:49:35 AM UTC
680 kB
Brian Towles
pause_issue_cache_not_full_raw.html
Feb 03 2015 01:49:35 AM UTC
2.48 MB
Brian Towles
pause_issue_full_cache_2015-02-02 19-13-00.png
Feb 03 2015 01:49:35 AM UTC
122 kB
Brian Towles
pause_issue_not_full_cache_2015-02-02 19-13-00.png
Feb 03 2015 01:49:35 AM UTC
161 kB
Brian Towles
PauseStackTrace.txt
Apr 26 2015 01:56:14 AM UTC
184 kB
Eitan Klein
pstack.txt
Feb 04 2015 06:42:46 PM UTC
49 kB
Neal Rigney
server_15944_run.png
Feb 07 2015 03:00:03 AM UTC
122 kB
Brian Towles
server_17157_journal_on_xvdc_pstack.txt
Feb 04 2015 05:18:53 PM UTC
49 kB
Brian Towles
server_17157_journal_on_xvdc_raw.html
Feb 04 2015 05:18:53 PM UTC
1.71 MB
Brian Towles
server_17157_journal_on_xvdc.png
Feb 04 2015 05:18:53 PM UTC
93 kB
Brian Towles
SERVER_17157-DISK-ACTIVITY.png
Feb 03 2015 03:25:38 AM UTC
55 kB
Brian Towles
server-17157-rc9-long.png
Feb 25 2015 06:43:43 AM UTC
48 kB
Darren Wood
server-17157-rc9-zoom.png
Feb 25 2015 06:43:43 AM UTC
34 kB
Darren Wood
server17157-ycsbparams.txt
May 05 2015 07:13:43 PM UTC
0.4 kB
Eitan Klein
TCMalloc-YCSB.txt
Apr 28 2015 12:42:19 PM UTC
122 kB
Eitan Klein
wt-nochange.txt
Apr 26 2015 01:56:14 AM UTC
7 kB
Eitan Klein
wt-writethrough.txt
Apr 26 2015 01:57:22 AM UTC
13 kB
Eitan Klein
YCSB_pause_logs_sample.txt
Feb 03 2015 01:49:35 AM UTC
2 kB
Brian Towles
YCSB_workload
Feb 03 2015 04:11:10 PM UTC
0.4 kB
Brian Towles
ycsb5050-prerc9-norepl.png
Feb 13 2015 04:45:34 AM UTC
75 kB
Darren Wood
YCSB-zlib.txt
Apr 28 2015 02:02:15 PM UTC
185 kB
Eitan Klein

is related to

SERVER-17907 B-tree eviction blocks access to collection for extended period under WiredTiger

Closed

SERVER-16575 intermittent slow inserts with WiredTiger b-tree

Closed

SERVER-16790 Lengthy pauses associated with checkpoints under WiredTiger

Closed

SERVER-16938 60-second stall between checkpoints under WiredTiger

Closed

related to

SERVER-17194 Low Throughput for YCSB 50-50 workload with high client threads

Closed

links to

WhiteBoard w/ raw data related to server-17157

(1 links to)

Assignee:: Michael Cahill (Inactive)

Reporter:: Brian Towles (Inactive)

Participants:: Alexander Gorrod, Asya Kamsky, Brian Towles, Daniel Pasette, David Daly, Eitan Klein, Michael Cahill

Votes:: 1 Vote for this issue

Watchers:: 23 Start watching this issue

Created:: Feb 03 2015 01:49:35 AM UTC

Updated:: Sep 11 2015 03:43:29 PM UTC

Resolved:: Sep 11 2015 07:05:59 AM UTC

Confidence Status Last Update:: 19/Aug/15 7:48 PM

GA Target Date:: None

Public Preview Target Date:: None

Private Preview Target Date:: None

Experiment Target Date:: None

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates