-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Storage Execution
-
Fully Compatible
-
ALL
-
Execution Team 2024-11-25
-
200
The code invariants
randomSamplesPerMarker <= static_cast<uint64_t>(estimatedRecordsPerMarker)
randomSamplesPerMarker is a constant set to 10, whereas the other value is computed as follows
double avgRecordSize = double(dataSize) / double(numRecords); double estimatedRecordsPerMarker = std::ceil(minBytesPerMarker / avgRecordSize);
However, this does not hold if there is one very large record.
For example, suppose the numRecords reported is 1, and dataSize is reported as 16777328 bytes. With minBytesPerMarker set as 33_554_432 # 32 MiB by default,
- avgRecordSize = 16777328
- estimatedRecordsPerMarker = 2
- randomSamplesPerMarker = 10
This defies the invariant that randomSamplesPerMarker <= estimatedRecordsPerMarker