Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-97435

Fault in pre-image sampling arithmetic

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 8.1.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Storage Execution
    • Fully Compatible
    • ALL
    • Execution Team 2024-11-25
    • 200

      The code invariants
      randomSamplesPerMarker <= static_cast<uint64_t>(estimatedRecordsPerMarker)

      randomSamplesPerMarker is a constant set to 10, whereas the other value is computed as follows

        double avgRecordSize = double(dataSize) / double(numRecords);
        double estimatedRecordsPerMarker = std::ceil(minBytesPerMarker / avgRecordSize); 
      

      However, this does not hold if there is one very large record.

      For example, suppose the numRecords reported is 1, and dataSize is reported as 16777328 bytes. With minBytesPerMarker set as 33_554_432  # 32 MiB by default,

      • avgRecordSize = 16777328
      • estimatedRecordsPerMarker = 2
      • randomSamplesPerMarker = 10

      This defies the invariant that randomSamplesPerMarker <= estimatedRecordsPerMarker

            Assignee:
            haley.connelly@mongodb.com Haley Connelly
            Reporter:
            haley.connelly@mongodb.com Haley Connelly
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: