Uploaded image for project: 'Drivers'
  1. Drivers
  2. DRIVERS-82

Don't compile BSON regexes to native regexes

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Minor - P4 Minor - P4
    • None
    • Component/s: None

      Drivers can retrieve Regex-type BSON values from MongoDB under several circumstances:

      • When a regex has been stored in a document, then queried
      • When a regex query is in progress on another connection and the driver queries $cmd.sys.inprog
      • When a regex query is stored in system.profile and the driver queries the profile

      These regexes might be intended by their authors to be PCRE because they're intended to run on server. However, they need not always be. We can't make any predictions about the content of a BSON regex or what its author intended it to match.

      Unfortunately, most of the drivers compile BSON regexes into their native regex format, and all languages have different regex flavors. If a regex can't be compiled in the local flavor, then the whole document is unparsable and there's no workaround. Even if the regex is parsable in the local flavor, we the driver authors don't know if it will behave as intended, since we don't know if our local flavor matches the flavor the regex's author intended it to run on. Also, if the local regex flavor doesn't support the same flags as BSON--ilmsux--then it may be unable to round-trip regexes from server to client and back again while preserving all the flags. Finally, we doubt that most regexes retrieved from the server are ever executed client-side, so greedily compiling all regexes is wasteful.

      We must change the behavior in two steps.

      1. If your driver always compiles retrieved regexes, add some feature to optionally disable compilation. If compilation is disabled, represent the BSON regexes some other way, e.g. a MongoRegex class that contains the uncompiled regex pattern as a string, and its flags. (The name of the class is up to you.)

      A MongoRegex is encoded into a BSON regex, so it's a means of sending a PCRE to the server even if its pattern can't be compiled in the local flavor, and / or its flags aren't supported by the local flavor.

      The MongoRegex class should have a try_compile method to convert to a native regex, with a warning in the documentation like this:

      Warning: regular expressions retrieved from the server may include a pattern that cannot be compiled into a <LANGUAGE> regular expression, or which matches a different set of strings in <LANGUAGE> than it does when used in a MongoDB query, or it may have flags that are not supported by <LANGUAGE> regular expressions. try_compile() may raise a <WHATEVER> exception.

      Add a method like MongoRegex.from_native to attempt to convert from a native regex to a MongoRegex. It should be documented like:

      Warning: <LANGUAGE> regular expressions use a different syntax and different set of flags than BSON regular expressions. A regular expression matches different strings when executed in <LANGUAGE> than it matches when used in a MongoDB query, if it can be used in a query at all.

      2. In the next major (API-breaking) release, disable auto-compilation entirely. There shall be no option to turn automatic compilation back on. Users must retrieve MongoRegex instances and call try_compile to get native regexes.

      Native regular expressions will still be accepted by find(), insert(), update(), remove(), runCommand(), etc., but this is discouraged. Users should construct a MongoRegex from a string and flags.

            Assignee:
            barrie Barrie Segal
            Reporter:
            jesse@mongodb.com A. Jesse Jiryu Davis
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: