Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 6.0.1, 5.0.11, 6.1.0-rc0
Affects Version/s: None
Component/s: None
Labels:
None

Backwards Compatibility:
Fully Compatible
Backport Requested:

v6.0, v5.3, v5.0, v4.4
Sprint:
Repl 2022-05-02, Repl 2022-05-16, Repl 2022-05-30
Linked BF Score:
135
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

We cancel the election timeout on secondaries whenever the primary liveness is updated, which potentially happens on every oplog batch. We cancel the liveness timeout on primaries whenever the oldest secondary liveness is updated, which potentially happens on every replSetUpdatePosition. It turns out cancelling a timer, at least on Linux, is quite expensive (likely system call overhead), and we do this in the replication lock, which increases contention on that already-hot mutex.

We can greatly reduce this with a class which handles "cancel and reschedule" by keeping track of the latest time of the reschedule, and then when the timeout occurs, reschedules at that point instead of immediately. This means we get no cancels and one reschedule every timeout interval (not every miniscule bump forward of the timer)

is related to

SERVER-59776 50% regression in single multi-update

Closed

Assignee:: Matthew Russotto
Reporter:: Matthew Russotto
Participants:: Githook User, Matthew Russotto
Votes:: 0 Vote for this issue
Watchers:: 7 Start watching this issue

Created:: Apr 27 2022 07:28:37 PM UTC
Updated:: Oct 29 2023 09:38:55 PM UTC
Resolved:: May 18 2022 06:35:07 PM UTC
Confidence Status Last Update:: 27/Apr/22 7:30 PM

Details

Description

Attachments

Issue Links

Activity

People

Dates