-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Aggregation Framework
-
Fully Compatible
-
ALL
-
v4.2, v4.0, v3.6
-
-
Query 2020-02-24
During query execution, when documents pass between the PlanStage tree and the pipeline of DocumentSources, they are first buffered in batches using a std::deque by the $cursor stage. The size of the batches is controlled by the internalDocumentSourceCursorBatchSizeBytes setParameter, which defaults to 4MB.
For count-like aggregation queries, this 4MB limit is not respected, leading to unbounded memory consumption. See the repro steps below for an example "count-like" query. In this query, the aggregation pipeline is responsible only for counting documents and does not actually require any of the data fields to be propagated from the PlanStage tree to the DocumentSource pipeline. This is implemented by pushing empty Documents onto the $cursor stage's std::deque. When the memory accounting code attempts to incorporate the size of these empty Documents, it calls Document::getApproximateSize(). This ends up having no effect, because Document::getApproximateSize() returns 0 for empty Documents. As a result, the std::deque of empty Document is allowed to grow without bound. In the repro described below, the deque becomes millions of elements long and consumes close to 1GB of memory.
In order to fix this we could explore a few approaches:
- Fix the memory accounting code to include the size of the Document itself, not just the DocumentStorage. Also account for any additional memory consumed by the std::deque.
- Change how count-like aggregates execute to avoid creating a large deque of empty documents. Theoretically, this buffering is unnecessary. We could simply discard a matching document and simultaneously increment the counter inside the $sum accumulator.