Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Duplicate
Priority: Critical - P2
Fix Version/s: None
Affects Version/s: None
Component/s: Replication, Storage
Labels:
None

Assigned Teams:

Replication
Operating System:
ALL
Case:
Confidence Status:
None
Work Order:
0

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

If a disk failure occurs in such a way as to block IO without returning (admittedly a rare occurrence), the affected mongod will never give up waiting for the IO to complete. Heartbeats are returned as normal, so other nodes will continue to trust it despite being permanently dysfunctional.

A replica-set or a sharded cluster can eventually be locked up until the single faulty node is identified and terminated.

duplicates

SERVER-29947 Implement Storage Node Watchdog

Closed

is duplicated by

SERVER-15417 Arbiter didn't elect primary if OS is unreachable (except ping)

Closed

SERVER-28422 Cluster stuck because replication heartbeat does not detect hanging members

Closed

related to

SERVER-29980 Built-in hang detection diagnostics and recovery

Closed

SERVER-9552 when replica set member has full disk, step down to (sec|rec)?

Backlog

Assignee:: [DO NOT USE] Backlog - Replication Team
Reporter:: Andrew Ryder (Inactive)
Participants:: [DO NOT USE] Backlog - Replication Team, Andrew Ryder, Andy Schwerin, Geert Bosch, Jonathan Reams, Kelsey Schubert, Niraj Londhe, Ramon Fernandez Marina, VictorGP
Votes:: 5 Vote for this issue
Watchers:: 47 Start watching this issue

Created:: Jun 03 2014 01:51:46 AM UTC
Updated:: Dec 06 2022 05:05:17 AM UTC
Resolved:: Jul 17 2017 09:58:08 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates