Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 3.6.16, 4.2.2, 4.0.14, 4.3.2
Affects Version/s: 3.6.15
Component/s: Replication
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v4.2, v4.0, v3.6
Sprint:
Repl 2019-11-18, Repl 2019-12-02
Linked BF Score:
8
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

The test fails (very rarely) due to a race in how the repl.buffer.count metric is calculated. There's a period when the rsBackgroundSync thread has added oplog entries to the buffer but hasn't yet incremented repl.buffer.count. During this period, the ReplBatcher thread can clear the buffer and decrement repl.buffer.count. Since the count can be decremented before it's incremented, it can be briefly negative. The server_status_metrics.js test doesn't expect this race.

First, the test inserts 1000 docs with w: 2. The secondary's oplog buffer fills and empties, the metric is incremented by 1000 and decremented by 1000. The test calls serverStatus on the secondary and checks that repl.buffer.count >= 0, in fact it's 0, and the assertion passes.

Next, the test updates all 1000 docs with w: 2. Events proceed perhaps in this order:

the rsBackgroundSync thread in BackgroundSync::_enqueueDocuments buffers 1000 oplog entries, bufferCountGauge is still 0
the ReplBatcher thread in SyncTail::tryPopAndWaitForMore calls bufferCountGauge.decrement(1) a thousand times, now it's -1000
the test calls serverStatus, repl.buffer.count is -1000 so the test will fail
the rsBackgroundSync thread in BackgroundSync::_enqueueDocuments calls bufferCountGauge.increment(1000)

Assignee:: A. Jesse Jiryu Davis
Reporter:: A. Jesse Jiryu Davis
Participants:: A. Jesse Jiryu Davis, Githook User
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Nov 15 2019 09:19:42 PM UTC
Updated:: Oct 29 2023 10:14:51 PM UTC
Resolved:: Nov 19 2019 06:38:47 PM UTC

Details

Description

Attachments

Activity

People

Dates