-
Type: Improvement
-
Resolution: Won't Do
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
QE 2023-02-06, QE 2023-02-20, QE 2023-03-06, QE 2023-03-20, QE 2023-04-03, QE 2023-04-17, QE 2023-05-01
Follow up to streaming group, this has the potential to remove sort if all buckets are sorted(or from compressed format). Could be useful to test the solution first before committing to make the change.
- Coarse Grained Streaming Group
- Observation: locally sorted runs, globally “coarsely” sorted. Underscored numbers are min time for each bucket.
- 10,11,12,13,14,13,14,15,16,17,16,17,18.19.20
- Leading active groups can be closed at “sorted run boundary”, [...] represent groups, closed groups have strike-through markings.
- 10,11,12,13,14 [10-12],[13-15]
- 10,11,12,13,14,13
[10-12],[13-15] - 10,11,12,13,14,13,14,15,16,17 [13-15],[16-18]
- 10,11,12,13,14,13,14,15,16,17,16
[13-15],[16-18]
- Current State(as of PM-3050): Current Streaming group requires global sort (optimized as “bounded” sort needing a local heap to sort the run up to the next "bound").
- 10,11,12,13,13,14,14,15,16,16,17,17,18.19.20
- Current State: but can close the leading group earlier (less memory usage) once input is sorted (group bounds to be compared to the next value to determine whether to output an active group)
- 10,11,12 [10-12]
- 10,11,12,13
[10-12],[13-15] - 10,11,12,13,13,14,14,15,16
[13-15],[16-18]
- How early is a function of the group size and amount of overlap
- This proposal essentially combines run boundary detection with grouping (ie compare group bundary against run boundary to determine whether to stream the results from an active group or not)
- We believe this would improve the query performance due to one fewer step in the pipeline as the sort could be eliminated.
- How to detect “sorted run boundary”: Extra info to indicate bucket start/end is provided in the stream.
- Borrow from the “bounded sort” implementation:
- “ It also can see the control.min.time value of the containing bucket. Whenever it sees the control.min.time change, it knows that no subsequent input will have a lower value, and so anything below this value can be returned.”
- Info made available to sort via BoundMakerMin/Max
- Borrow from the “bounded sort” implementation:
- Observation: locally sorted runs, globally “coarsely” sorted. Underscored numbers are min time for each bucket.