Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-55875

Make the thread liveness monitor to detect the stuck disk I/O

    • Type: Icon: Improvement Improvement
    • Resolution: Won't Fix
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None

      This new behavior would help in the HELP ticket incident. While we have the Enterprise Watchdog monitoring the storage health the Community edition mongod primary can be stuck on a faulty drive for hours without stepping down. The Watchdog targets this problem fast, but there is no good story for community edition at all.

      While the Enterprise Watchdog will continue providing premium services, the Enterprise edition will have a more generic slower solution, however still preventing a multi-hour outage. The reaction time will be different by design, maintaining the service differentiation: Watchdog is capable to detect such outage as fast as 10-30 seconds (based on configuration) while the thread liveness monitor will achieve identical result after 5-10 minutes of outage.

      Assigning to shameek.ray to make this blocked on the PM ticket he is creating.

            Assignee:
            shameek.ray@mongodb.com Shameek Ray
            Reporter:
            andrew.shuvalov@mongodb.com Andrew Shuvalov (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: