Loading...

XML

Word

Printable

JSON

Type: New Feature
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Stability, Storage
Labels:
- or-workload-management

Assigned Teams:

Workload Scheduling
Case:
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

Currently when a mongoD process runs out of disk space and fails to preallocate a file or write to the journal, it responds with terminating the server process.

This proves to be a difficult place to be in because the remove operation in and of itself will fail when attempting to reclaim space. Furthermore, things that write to disk temporarily like external sort or temporary agg results will also have problems with this.

A more graceful approach would be to allow us to limit mongoD space utilization to some threshold before filling the disk, so that cleanup and stabilization of the system is facilitated.

Something like "Stop accepting writes (other than removes) if less than 10% (or some number of GB) disk space available" or "If preallocation fails due to lack of space (2GB) for the final datafile, stop accepting writes aside from removes" would be much more graceful. This would of course mean $out and external sorts should fail as well. but would save from dealing with all the other issues associated with full disk.

Of course there are edge cases to be considered such as, if a secondary hits this threshold, it can no longer replicate therefore it should be marked as down or unavailable with respect to the quorum. (Which I believe already happens ) but then how do we process cleanup if it can't replicate the removes? We'll just have to increase capacity or do a full resync in situations where a secondary runs out of disk before a primary.

But for the general case, this would be a huge win, whether the number is configurable or not.

is duplicated by

SERVER-15952 mongod hits assertion when run out of disk space

Closed

SERVER-15959 Running out of disk space should not entirely crash server

Closed

is related to

SERVER-3759 filesystem ops may cause termination when no space left on device

Closed

Assignee:: Unassigned
Reporter:: Osmar Olivo (Inactive)
Participants:: Geert Bosch, Kyle Mertz, Osmar Olivo, Steven Vannelli
Votes:: 10 Vote for this issue
Watchers:: 28 Start watching this issue

Created:: May 01 2014 09:21:19 PM UTC
Updated:: May 01 2025 05:45:16 PM UTC
Resolved:: May 01 2025 05:45:15 PM UTC
Confidence Status Last Update:: 05/Apr/19 4:36 AM

Details

Description

Attachments

Issue Links

Activity

People

Dates