ISSUE SUMMARY
Reading from secondary nodes in a replica set may block the application of replication write operations, because longer read operations may not yield appropriately.
USER IMPACT
High volume read operations on secondary nodes may cause the nodes to experience increased replication lag, which may make read operations return old data.
In extreme cases the affected node may become "stale". Stale nodes need to be resynchronized. If enough nodes in a replica set become stale availability may be impacted.
WORKAROUNDS
The preferred workaround is to suspend all read operations on secondary nodes.
Alternatively, the oplog size can be increased on secondary nodes. This is only a suitable workaround if the nodes undergo periods of no reads so replication can catch up.
AFFECTED VERSIONS
MongoDB 3.0.0 through 3.0.3.
FIX VERSION
The fix is included in the 3.0.4 production release.
Original description
- 3 table scans each taking 5-10 seconds (and returning no results) were done on a collection of about 12M documents on the secondary, marked A-B, C-D, E-F above. At the same time documents were inserted into the same collection on the primary, driving replication traffic.
- During the table scans replication rate falls to 0, replication lag builds.
- Graphs show straight lines between the beginning and end of the stalls, indicating that the serverStatus command that the data collection depends on was blocked as well.
- Primary is not similarly affected by the same table scan.
- Problem reproduces on both WiredTiger and mmapv1
- is duplicated by
-
SERVER-18200 Long running queries on secondary causes replication to fall behind
- Closed
-
SERVER-18325 MongoDb 3.0.2 background index creation still blocking
- Closed