Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 7.0.0-rc0, 6.0.13, 5.0.24, 4.4.29
Affects Version/s: None
Component/s: Replication, Write Ops
Labels:
None

Assigned Teams:

Storage Execution
Backwards Compatibility:
Fully Compatible
Backport Requested:

v7.2, v7.0, v6.0, v5.0, v4.4
Sprint:
Execution Team 2023-03-06, Execution Team 2023-03-20
Case:
Confidence Status:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

An oplog hole represents a Timestamp in the oplog an active storage transaction would commit at that is behind the Timestamp in the oplog an already-committed storage transaction wrote their oplog entry at. For example, it is possible for a storage transaction at Timestamp 20 to have committed and for a storage transaction at Timestamp 10 to commit later on in wall-clock time. Oplog readers are prevented from reading beyond the oplog hole to ensure they don't miss any oplog entries which might still commit. Keeping an oplog hole open for any extended period of wall-clock time can lead to stalls in replication.

Vectored insert is an example of an operation which pre-allocates Timestamps to write user data and the corresponding oplog entries at. In MongoDB 4.4, ~~SERVER-46161~~ caused the default internal batch size for vectored insert to increase from 64 to 500. This has been seen to lead to higher tail latencies for vectored inserts (~~SERVER-65054~~).

Other operations cause oplog holes too. If expensive work is done (e.g. within an OpObserver) after the oplog slot is allocated and prior to the storage transaction committing, then those operations can stall replication too. Introducing some logging to track the time spent would give more insight into these areas and perhaps even be useful to signal on within our performance testing.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

oplog_visbility_lag_example.png
135 kB
Oct 17 2022 06:34:08 PM UTC
Screen Shot 2022-10-05 at 9.27.40 AM.png
132 kB
Oct 05 2022 07:28:28 AM UTC

is related to

SERVER-84449 High WiredTiger session concurrency can increase replication write latency

Open

SERVER-84467 Add duration of how long an oplog slot is kept open to FTDC

Open

SERVER-74604 Report the same metrics in both "Slow query" and the profiler data documents

Backlog

SERVER-65054 Avoid slow insert batches blocking replication

Closed

Assignee:: Dianna Hohensee (Inactive)

Reporter:: Max Hirschhorn

Participants:: Bruce Lucas, Dianna Hohensee, Githook User, Louis Williams, Max Hirschhorn

Votes:: 0 Vote for this issue

Watchers:: 17 Start watching this issue

Created:: Oct 02 2022 12:54:35 PM UTC

Updated:: Feb 11 2025 04:37:22 AM UTC

Resolved:: Mar 13 2023 07:23:35 PM UTC

Confidence Status Last Update:: 01/Mar/23 3:22 PM

GA Target Date:: None

Public Preview Target Date:: None

Private Preview Target Date:: None

Experiment Target Date:: None

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates