Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-22580

Add $cpLength and $cpSubstr expressions which work via code points

    • Type: Icon: New Feature New Feature
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 3.3.4
    • Affects Version/s: None
    • Component/s: Aggregation Framework
    • None
    • Fully Compatible
    • Query 12 (04/04/16)
    • 0

      Syntax

      {$substrBytes: [ <string>, <expression>, <expression>] }
      {$substrCP: [ <string>, <expression>, <expression>] }
      

      Examples

      Input

      {_id: 0, string: "ελληνικά"}
      

      Pipeline

      db.coll.aggregate([{
          $project: {
              byteSubstr: {$substrBytes: ["$string", 0, 4]},
              cpSubstr: {$substrCP: ["$string", 0, 4]}
          }
      }])
      

      Output

      {_id: 0, byteSubstr: "ελ", cpSubstr: "ελλη"}
      

      Additional Notes

      • Will not add any new query functionality to work with strings.
      • $substrBytes will error if it starts or ends in the middle of a code point.
      • $substrCP will error on any input that is detected to be invalid UTF-8.

      Original Description

      The current expression $substr, and the proposed expression $length (see SERVER-14670) will work in terms of bytes in the string. Sometimes it is desirable to work in terms of code points instead, so we should add the equivalent expressions that will work with code points.

      For example, {$substr: ["\uD834\uDF06", 0, 1]} would be an error (since the second is a continuation byte), but {$cpSubstr: ["\uD834\uDF06", 0, 1]} would be "\uD834\uDF06".

      Correspondingly, {$length: "\uD834\uDF06"} would be 2, but {$cpLength: "\uD834\uDF06"} would be 1.

            Assignee:
            benjamin.murphy Benjamin Murphy
            Reporter:
            charlie.swanson@mongodb.com Charlie Swanson
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: