-
Type: Task
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Component/s: None
-
None
TBD
Description of Linked Ticket
Summary
We will implement a new Agg stage, $union, that allows to merge results of n pipelines preserving duplicates. In order to enable merging data from multiple collections, we will also introduce an explicit stage to reference a collection, $collection.
Motivation
Union is a fundamental operation in relational algebra. We have several specific scenarios:
- BIC connector for completeness with SQL.
- TimeSeries scenario to combine data stored in per-period collections into one logical collection.
- Combining collections in Data Lake, e.g. archival and recent data, data from different regions.
For analytical scenarios, customers expect a complete set of fundamental operations. For example for Tableau, union and unpivot were top requested features after joins. In the future, we will be improving $lookup but delivering general and performant joins is a hard task. At the same time, union-like logic is already supported for operations that require merging results across shards in the backend.
Documentation
- depends on
-
CSHARP-2863 Ability to specify union
- Closed
-
JAVA-3520 Ability to specify union
- Closed