-
Type: Task
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
Atlas Streams
-
Fully Compatible
-
Sprint 43, Sprint 44
Currently, the sink in a streams pipeline will project the "_stream_meta" field into the document it writes to the sink. The __ "_stream_meta" field contains information about the source, and window.
For example:
{ ... _stream_meta: { sourceType: 'atlas', windowStartTimestamp: ISODate("2024-01-19T20:10:04.000Z"), windowEndTimestamp: ISODate("2024-01-19T20:10:06.000Z") } }
However, users cannot use the "_stream_meta" field in expressions in their pipeline. For example the below $project would not work.
[ ... { '$tumblingWindow': { interval: { size: 2, unit: 'second' }, pipeline: [ { '$group': { _id: null, count: {$sum: 1} } } ] } }, { $project: { _id: "$_stream_meta.windowStartTimestamp" } }, ... ]
In this ticket we need to work with PMs to define the desired behavior for "_stream_meta", and write a brief technical doc describing how we will implement the behavior.
Regarding the behavior, we need to define things like:
Using the below pipeline as an example, should $project1, $match, $project3, and $project4 all be able to use the "_stream_meta" fields in the documents?
[
$source,
$project1
$tumblingWindow: {
pipeline: [
$match: { some expression on _stream_meta.windowStartTimestamp },
$group: {_id: null, count: {$sum: 1}}
$project3
]
},
$project4
]