-
Type: New Feature
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Aggregation Framework
-
None
-
Fully Compatible
-
Query 12 (04/04/16)
-
0
Syntax
{$substrBytes: [ <string>, <expression>, <expression>] } {$substrCP: [ <string>, <expression>, <expression>] }
Examples
Input
{_id: 0, string: "ελληνικά"}
Pipeline
db.coll.aggregate([{ $project: { byteSubstr: {$substrBytes: ["$string", 0, 4]}, cpSubstr: {$substrCP: ["$string", 0, 4]} } }])
Output
{_id: 0, byteSubstr: "ελ", cpSubstr: "ελλη"}
Additional Notes
- Will not add any new query functionality to work with strings.
- $substrBytes will error if it starts or ends in the middle of a code point.
- $substrCP will error on any input that is detected to be invalid UTF-8.
Original Description
The current expression $substr, and the proposed expression $length (see SERVER-14670) will work in terms of bytes in the string. Sometimes it is desirable to work in terms of code points instead, so we should add the equivalent expressions that will work with code points.
For example, {$substr: ["\uD834\uDF06", 0, 1]} would be an error (since the second is a continuation byte), but {$cpSubstr: ["\uD834\uDF06", 0, 1]} would be "\uD834\uDF06".
Correspondingly, {$length: "\uD834\uDF06"} would be 2, but {$cpLength: "\uD834\uDF06"} would be 1.
- is depended on by
-
CSHARP-1622 Add $cpLength and $cpSubstr expressions which work via code points
- Closed
- related to
-
DRIVERS-297 Aggregation Framework Support for 3.4
- Closed