• Type: Icon: Sub-task Sub-task
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Execution
    • Fully Compatible
    • QE 2024-04-15, QE 2024-04-29, QE 2024-05-13

      Classic $group (the document source) uses ValueUnorderedMap which is defined as absl::node_hash_map. If we don't keep pointers into this map, we should look into changing the implementation to use absl::flat_hash_map, since this map requires less heap allocations and indirection.

      MongoDB Enterprise > db.bar.aggregate([{$group: {_id: null, total: {$sum: "$a"}}}])
      { "_id" : null, "total" : 26624000 } 

      Do not see a noticeable speedup in SBE. All the "a" fields are "2". Saw ~4% speedup in classic mode, but this query runs in SBE by default. There may be queries that run in classic by default that could see some speedup?

      diff.diff

      Here's a group query that runs in classic by default

      db.weather.insertMany( [
         {
             "metadata": { "sensorId": 5578, "type": "temperature" },
             "timestamp": ISODate("1990-05-18T00:00:00.000Z"),
             "temp": 12
         },
         {
             "metadata": { "sensorId": 5578, "type": "temperature" },
             "timestamp": ISODate("2021-05-18T04:00:00.000Z"),
             "temp": 11
         },
         {
             "metadata": { "sensorId": 5578, "type": "temperature" },
             "timestamp": ISODate("2021-05-18T08:00:00.000Z"),
             "temp": 11
         },
         {
             "metadata": { "sensorId": 5578, "type": "temperature" },
             "timestamp": ISODate("2021-05-18T12:00:00.000Z"),
             "temp": 12
         }
      ] )
      db.weather.aggregate( [
         {
            $densify: {
               field: "timestamp",
               range: {
                  step: 1,
                  unit: "minute",
                  bounds:[ ISODate("1990-05-18T00:00:00.000Z"), ISODate("2021-05-18T04:00:00.000Z") ]
               }
            }
         },
         {
            $group: {
                _id: {"$dateTrunc":{"date":"$timestamp","unit":"year"}},
                count: { $count: { } }
            }
         }
      ] )
       

      Need to increase this knob

      db.adminCommand({ setParameter: 1, "internalQueryMaxAllowedDensifyDocs": 50000000 }) 

      Takes about 15 seconds, about 3% of which are in hash table try_emplace_back.

      |------+----------------+-------|
      | flat | queryFramework |    ms |
      |------+----------------+-------|
      | no   | classic        | 16729 |
      | no   | classic        | 16726 |
      |------+----------------+-------|
      | yes  | classic        | 16285 |
      | yes  | classic        | 16280 |
      | yes  | classic        | 16291 |
      |------+----------------+-------|
      | yes  | sbe            | 16301 |
      | yes  | sbe            | 16289 |
      |------+----------------+-------|
      | no   | classic        | 16817 |
      | no   | classic        | 16923 |
      | no   | classic        | 16918 |
      |------+----------------+-------|
      | no   | sbe            | 16885 |
      | no   | sbe            | 16896 |
      | no   | sbe            | 16959 |
      |------+----------------+-------|

        1. tpch-results-node.txt
          7 kB
        2. tpch-results-flat.txt
          5 kB
        3. Screenshot 2024-05-02 at 5.59.27 PM.png
          Screenshot 2024-05-02 at 5.59.27 PM.png
          64 kB
        4. diff.diff
          7 kB
        5. 45s-tpch-q10-1-denorm-node-hash-set.svg
          409 kB
        6. 45s-tpch-q10-1-denorm-flat-hash-map.svg
          399 kB

            Assignee:
            evan.bergeron@mongodb.com Evan Bergeron
            Reporter:
            matt.boros@mongodb.com Matt Boros
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: