-
Type: New Feature
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Query Execution
Mongosync applies document queries in two contexts:
1) partitioning during initial sync
2) cluster-wide change streams
The initial-sync queries are per-collection and so use each collection's default collation. The change stream, though, is multi-collection, so it's simple-collated. Thus, if we search on "_id > aaa && _id < zzz" we'll match _id=BBB during initial sync but not in the change stream.
SERVER-82815 will provide a solution for this by allowing aggregation to convert _id, aaa, zzz, and BBB to whatever byte sequence the server uses to represent them in indexes.
This problem worsens in the context of [document filtering|REP-1954], where the query will come from the customer. Here we either have to limit the scope of support for strings in queries pretty dramatically or implement some sort of query-transform logic based on SERVER-82815's new operator ... but even that would likely only support certain limited use cases.
We can soften the problem somewhat by having customers migrate like-collated collections in concurrent mongosync sessions. Given limitations on the # of concurrent change streams, though, this won't scale well to multi-tenant setups where dozens, even hundreds, of collations may coexist on a given source cluster.
It seems that, ultimately, we can't "gracefully" support collations without some ability to apply multiple collations in a given change stream.
- duplicates
-
SERVER-25954 Support more granular collation specification
- Backlog
- split from
-
SERVER-82815 Expose server’s index key creation via aggregation
- Closed