Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-13811

Deal better/Fail more gracefully when mongoD runs out of disk space

    • Workload Scheduling

      Currently when a mongoD process runs out of disk space and fails to preallocate a file or write to the journal, it responds with terminating the server process.

      This proves to be a difficult place to be in because the remove operation in and of itself will fail when attempting to reclaim space. Furthermore, things that write to disk temporarily like external sort or temporary agg results will also have problems with this.

      A more graceful approach would be to allow us to limit mongoD space utilization to some threshold before filling the disk, so that cleanup and stabilization of the system is facilitated.

      Something like "Stop accepting writes (other than removes) if less than 10% (or some number of GB) disk space available" or "If preallocation fails due to lack of space (2GB) for the final datafile, stop accepting writes aside from removes" would be much more graceful. This would of course mean $out and external sorts should fail as well. but would save from dealing with all the other issues associated with full disk.

      Of course there are edge cases to be considered such as, if a secondary hits this threshold, it can no longer replicate therefore it should be marked as down or unavailable with respect to the quorum. (Which I believe already happens ) but then how do we process cleanup if it can't replicate the removes? We'll just have to increase capacity or do a full resync in situations where a secondary runs out of disk before a primary.

      But for the general case, this would be a huge win, whether the number is configurable or not.

            Assignee:
            Unassigned Unassigned
            Reporter:
            osmar.olivo Osmar Olivo
            Votes:
            10 Vote for this issue
            Watchers:
            28 Start watching this issue

              Created:
              Updated: