-
Type: Bug
-
Resolution: Duplicate
-
Priority: Critical - P2
-
None
-
Affects Version/s: None
-
Component/s: Replication, Storage
-
None
-
Replication
-
ALL
-
(copied to CRM)
If a disk failure occurs in such a way as to block IO without returning (admittedly a rare occurrence), the affected mongod will never give up waiting for the IO to complete. Heartbeats are returned as normal, so other nodes will continue to trust it despite being permanently dysfunctional.
A replica-set or a sharded cluster can eventually be locked up until the single faulty node is identified and terminated.
- duplicates
-
SERVER-29947 Implement Storage Node Watchdog
- Closed
- is duplicated by
-
SERVER-15417 Arbiter didn't elect primary if OS is unreachable (except ping)
- Closed
-
SERVER-28422 Cluster stuck because replication heartbeat does not detect hanging members
- Closed
- related to
-
SERVER-29980 Built-in hang detection diagnostics and recovery
- Closed
-
SERVER-9552 when replica set member has full disk, step down to (sec|rec)?
- Backlog