-
Type: Bug
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: 3.0.0-rc6
-
Component/s: Replication, WiredTiger
-
None
-
Environment:Centos 6
-
Fully Compatible
-
Linux
-
We're seeing our replicaset not able to keep up with the primary in a peculiar way.
Previously we were on 2.6 and the replication worked fine, no changes since then except upgrading to 3.0.0-rc6.
I see (via mongostat) primaries getting approx. 4k updates/sec each times 8 shards; secondaries show 0 updates/sec. I stop the replica daemon, wipe the directory, and restart. The resync starts and executes properly, catching up and going into 'SEC' mode on mongostat. This lasts only several seconds before the updates/sec on SEC goes to 0. Primary is still 4k updates/sec.
Logs on secondaries show lots of these kind of messages:
2015-01-26T14:02:48.942-0600 I QUERY [conn193] killcursors keyUpdates:0 writeConflicts:0 numYields:0 11777ms 2015-01-26T14:02:48.942-0600 I QUERY [conn109] killcursors keyUpdates:0 writeConflicts:0 numYields:0 11717ms 2015-01-26T14:02:48.943-0600 I QUERY [conn133] killcursors keyUpdates:0 writeConflicts:0 numYields:0 11702ms 2015-01-26T14:02:48.943-0600 I QUERY [conn206] killcursors keyUpdates:0 writeConflicts:0 numYields:0 11691ms 2015-01-26T14:02:48.943-0600 I QUERY [conn156] killcursors keyUpdates:0 writeConflicts:0 numYields:0 11681ms 2015-01-26T14:03:01.363-0600 I NETWORK [conn218] end connection 10.235.67.65:18027 (113 connections now open)
I've updated several times through rc4, rc5, rc6, and am now even running the nightly, all show the same behavior.
Note this is a very write-intensive application. Data is stored on SSD's, journals on spinning disk, but I've tried moving journals to SSD and it hasn't helped.
- duplicates
-
SERVER-16921 WT oplog bottleneck on secondary
- Closed