Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 6.0.0-rc0
Affects Version/s: None
Component/s: None
Labels:
None

Backwards Compatibility:
Fully Compatible
Linked BF Score:
135

Ad-hoc tests suggest that we can improve performance of last point queries by changing our query rewrite.

Consider the following user-specified query:

 db.telemetry.aggregate([
    {$sort: {"metadata.sensorId": 1, "timestamp": 1}},
    {$group: {
      _id: "$metadata.sensorId",
      ts: {$last: "$timestamp"},
      temp: {$last: "$temp"}
    }}
 ]);

which we usually would rewrite to:

db.system.buckets.telemetry.aggregate([
   {$sort: {"meta.sensorId": 1, "control.max.timestamp": -1}},
   {$group: {
      _id: "$meta.sensorId",
      bucket: {$first: "$_id"},
   }},
   {$lookup: {
      from: "system.buckets.telemetry",
      foreignField: "_id",
      localField: "bucket",
      as: "bucket_data",
      pipeline:[
         {$_internalUnpackBucket: {
            timeField:"timestamp",
            metaField:"tags",
            bucketMaxSpanSeconds:NumberInt("60")
         }},
         {$sort: {"timestamp": -1}},
         {$limit:1}
      ]
   }},
   {$unwind: "$bucket_data"},
   {$replaceWith:{
      _id: "$_id",
      ts: "$bucket_data.timestamp",
      temp: "$bucket_data.temp"
   }}
]);

We actually can get the same results with slightly better runtime by avoiding the $lookup using the following alternative rewrite:

db.system.buckets.telemetry.aggregate([
   {$sort: {"meta.sensorId": 1, "control.max.timestamp": -1}},
   {$group: {
      _id: "$meta.sensorId",
      bucket: {$first: "$_id"},
      control: {$first: "$control"},
      meta: {$first: "$meta"},
      data: {$first: "$data"} 
   }},
   {$_internalUnpackBucket: {
      timeField:"timestamp",
      metaField:"meta",
      bucketMaxSpanSeconds:NumberInt("60")
   }},
   {$sort: {"meta.sensorId": 1, "timestamp": -1}},
   {$group: {
      _id: "$meta.sensorId",
      ts: {$first: "$timestamp"},
      temp: {$first: "$temp"}
   }}
]);

This optimization was suggested by a comment from david.percy on the tech design for PM-2330:

https://docs.google.com/document/d/1JmOlJh8NmgUa1tDZyHElCk0YlAJSxW0gURm4qGYn5AU/edit?disco=AAAAVXqkyAo

The tweak described above improved runtime from 210ms to 140ms in my tests with a debug build on the following data set: https://gist.github.com/starzia/9d1f8a25a2e2e2124b78e2da71159602

However, genny tests showed every larger larger latency improvements – more than a 7x speedup

is duplicated by

SERVER-61659 TS Last Point opt: final test plan review

Closed

Assignee:: Alya Berciu

Reporter:: Steve Tarzia

Participants:: Alya Berciu, Githook User, Steve Tarzia

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: Mar 28 2022 08:55:05 PM UTC

Updated:: Oct 29 2023 09:40:26 PM UTC

Resolved:: Apr 07 2022 09:35:45 AM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates