Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- repl-shortlist

Assigned Teams:

Replication
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

In ~~SERVER-56054~~ we made a change such that waits performed by oplog applier thread pool threads will eventually wake up after hitting the maxIdleThreadAge timeout if not sooner. This was to help mitigate a glibc bug that can cause a lost condition variable signal.

Currently, if a user encounters such an issue it is difficult to diagnose it from FTDC and logs alone. Additionally, we don't have a definitive list of all such bugs and what exact glibc versions they affect on different Linux distributions, so it's not trivial to say for certain whether this problem is what a user faced.

We should look into any diagnostics we could add (serverStatus metric, log messages) that would help more definitively identify cases where there was work to do yet oplog applier threads only got woken up due to hitting maxIdleThreadAge.

related to

SERVER-56054 Change minThreads value for replication writer thread pool to 0

Closed

SERVER-92554 Consider lowering maxIdleThreadAge for oplog applier thread pool

Open

Assignee:: Unassigned
Reporter:: Kaitlin Mahar
Participants:: Kaitlin Mahar
Votes:: 0 Vote for this issue
Watchers:: 10 Start watching this issue

Created:: Jul 17 2024 06:48:28 PM UTC
Updated:: Jul 22 2024 05:28:59 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates