-
Type: Improvement
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Query Execution
-
None
-
Query Execution
SERVER-70395 recently improved the performance of spilling in SBE's HashAggStage by adopting the following algorithm. When the hash table exceeds its memory budget, the entire contents of the hash table are flushed to a TemporaryRecordStore and the hash table itself is cleared. This may happen many times as the input data is consumed. Importantly, the TemporaryRecordStore is sorted by the group-key, implemented by encoding the MaterializedRow for the key into the record store's RecordId. This means that once the data is consumed, there will be sequences of equal keys that are adjacent in the record store; a monotonically increasing counter is used to ensure that the {{RecordId}}s are unique. The partial aggregates can be merged to produce the final output using a single forwards pass over the spill table.
While spilling to a table sorted by group-by key leads to some nice simplicity in the implementation, anna.wawrzyniak@mongodb.com pointed out that it could result in bad IO access patterns. In particular, each time we spill new data from the hash table to the TemporaryRecordStore, we may need to write data to every page of the spill table.
As an alternative, we could look into always appending the newly spilled data (sorted by key) to the end of the TemporaryRecordStore. This would be similar to how spilling in DocumentSourceGroup works – it appends a new sorted segment to a spill file every time a spill event occurs. The benefit is that when we spill, we don't have to write new data to the pages that were written during a previous spill. When merging the partial aggregates, we would need to do a merge-sort of the spilled segments much like DocumentSourceGroup does. Another consideration is that if there are too many spilled segments, we could have a merge tree with depth greater than 1 to avoid having to merge too many segments at once.
- is related to
-
SERVER-70395 Slot-Based Engine too aggressively uses disk for $group and is slow
- Closed