Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Fixed
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- FY2025Q2
- big

Assigned Teams:

Atlas Streams
Backwards Compatibility:
Fully Compatible
Sprint:
Sprint 47, Sprint 48, Sprint 49
Confidence Status:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

https://docs.google.com/document/d/1zPurErldtRGkl9COOM8jv_R_SAkfn4KW2Zd0temb71I/edit

Our streams Memory tracking undercounts by 33%. This undercounting can lead to some pod/OS “out of memory” errors if we’re not careful.
As part of this work we should extend the testing to other common memory-intensive pipelines. We might be missing other important allocation sites. Our “above the allocator” approach to memory tracking is hard to make fully accurate. We can also consider multiplying the MemoryTracker numbers by 1.X to account for undercounting.
One important aspect of this issue is: for an $unwind that duplicates strings, we save memory by referencing counting. However when restoring from a checkpoint we duplicate the string memory. This scenario is (in my opinion) not worth optimizing in checkpointing… but I’ve been able to cause a pod/OS OOM with this sort of pipeline during checkpoint restore. We should identify why the MemoryTracker is not catching this scenario.

See the attached results.json[0]["heapProfileBeforeCheckpoint"] for the stacks reported by the heap profiler. Guessing a bit, but we might be undercounting in the string allocation stack here:

            "0": "tcmalloc::tcmalloc_internal::SampleifyAllocation<>()",
            "1": "slow_alloc<>()",
            "2": "mongo::ValueStorage::putString()",
            "3": "mongo::ExpressionConcat::evaluate()",
            "4": "mongo::projection_executor::ProjectionNode::applyExpressions()",

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

results.json
582 kB
Mar 12 2024 07:14:18 PM UTC

mentioned in: Page Loading...

Assignee:: Harendra Chawla

Reporter:: Matthew Normyle

Participants:: Harendra Chawla, Matthew Normyle

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: Mar 12 2024 04:59:54 PM UTC

Updated:: Mar 11 2025 04:21:49 PM UTC

Resolved:: May 22 2024 04:46:54 PM UTC

Confidence Status Last Update:: 18/Apr/24 3:42 PM

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates