Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-18824

Support matching text that has embedded NUL bytes with $regex

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 3.3.2
    • Affects Version/s: None
    • Component/s: Querying
    • None
    • Fully Compatible
    • ALL
    • Hide

      mongodb shell test code:

      // test in version 3.0.3 and version 2.4.7

      > db.version()
      3.0.3
      > use testdb
      switched to db testdb
      > db
      testdb
      > db.test.find()
      > db.test.save({_id:"a","tag":"a"})
      WriteResult({ "nMatched" : 0, "nUpserted" : 1, "nModified" : 0, "_id" : "a" })
      > db.test.find()
      { "_id" : "a", "tag" : "a" }
      > db.test.save({_id:"a\x00","tag":"a0"})
      WriteResult({ "nMatched" : 0, "nUpserted" : 1, "nModified" : 0, "_id" : "a\u0000" })
      > db.test.find()
      { "_id" : "a", "tag" : "a" }
      { "_id" : "a\u0000", "tag" : "a0" }
      > db.test.find({_id: "a"})
      { "_id" : "a", "tag" : "a" }
      > db.test.find({_id: "a\x00"})
      { "_id" : "a\u0000", "tag" : "a0" }
      > db.test.find({_id: "a\u0000"})
      { "_id" : "a\u0000", "tag" : "a0" }
      > db.test.find({_id: "a\0"})
      { "_id" : "a\u0000", "tag" : "a0" }
      > db.test.find({_id:{"$regex":"^a"}}) // correct
      { "_id" : "a", "tag" : "a" }
      { "_id" : "a\u0000", "tag" : "a0" }
      > db.test.find({_id:{"$regex":"^a\x00"}})      // here { "_id" : "a", "tag" : "a" } is unexpect
      { "_id" : "a", "tag" : "a" }
      { "_id" : "a\u0000", "tag" : "a0" }
      > db.test.find({_id:{"$regex":"^a\u0000"}})   // and also here the first item is unexpect
      { "_id" : "a", "tag" : "a" }
      { "_id" : "a\u0000", "tag" : "a0" }
      >
      
      // test in 1.8.3 (correct)
      
      > db.version()
      1.8.3
      > use testdb
      switched to db testdb
      > db
      testdb
      > db.test.find()
      > db.test.save({_id:"a","tag":"a"})
      > db.test.find()
      { "_id" : "a", "tag" : "a" }
      > db.test.save({_id:"a\x00","tag":"a0"})
      > db.test.find()
      { "_id" : "a", "tag" : "a0" }
      > db.test.find({_id: "a"})
      { "_id" : "a", "tag" : "a0" }
      > db.test.find({_id: "a\x00"})
      { "_id" : "a", "tag" : "a0" }
      > db.test.find({_id: "a\u0000"})
      { "_id" : "a", "tag" : "a0" }
      > db.test.find({_id: "a\0"})
      { "_id" : "a", "tag" : "a0" }
      > db.test.find({_id:{"$regex":"^a"}}) // here lost an item 
      { "_id" : "a", "tag" : "a0" }
      > db.test.find({_id:{"$regex":"^a\x00"}}) // correct
      { "_id" : "a", "tag" : "a0" }
      > db.test.find({_id:{"$regex":"^a\u0000"}}) // correct
      { "_id" : "a", "tag" : "a0" }
      >
      
      Show
      mongodb shell test code: // test in version 3.0.3 and version 2.4.7 > db.version() 3.0.3 > use testdb switched to db testdb > db testdb > db.test.find() > db.test.save({_id: "a" , "tag" : "a" }) WriteResult({ "nMatched" : 0, "nUpserted" : 1, "nModified" : 0, "_id" : "a" }) > db.test.find() { "_id" : "a" , "tag" : "a" } > db.test.save({_id: "a\x00" , "tag" : "a0" }) WriteResult({ "nMatched" : 0, "nUpserted" : 1, "nModified" : 0, "_id" : "a\u0000" }) > db.test.find() { "_id" : "a" , "tag" : "a" } { "_id" : "a\u0000" , "tag" : "a0" } > db.test.find({_id: "a" }) { "_id" : "a" , "tag" : "a" } > db.test.find({_id: "a\x00" }) { "_id" : "a\u0000" , "tag" : "a0" } > db.test.find({_id: "a\u0000" }) { "_id" : "a\u0000" , "tag" : "a0" } > db.test.find({_id: "a\0" }) { "_id" : "a\u0000" , "tag" : "a0" } > db.test.find({_id:{ "$regex" : "^a" }}) // correct { "_id" : "a" , "tag" : "a" } { "_id" : "a\u0000" , "tag" : "a0" } > db.test.find({_id:{ "$regex" : "^a\x00" }}) // here { "_id" : "a" , "tag" : "a" } is unexpect { "_id" : "a" , "tag" : "a" } { "_id" : "a\u0000" , "tag" : "a0" } > db.test.find({_id:{ "$regex" : "^a\u0000" }}) // and also here the first item is unexpect { "_id" : "a" , "tag" : "a" } { "_id" : "a\u0000" , "tag" : "a0" } > // test in 1.8.3 (correct) > db.version() 1.8.3 > use testdb switched to db testdb > db testdb > db.test.find() > db.test.save({_id: "a" , "tag" : "a" }) > db.test.find() { "_id" : "a" , "tag" : "a" } > db.test.save({_id: "a\x00" , "tag" : "a0" }) > db.test.find() { "_id" : "a" , "tag" : "a0" } > db.test.find({_id: "a" }) { "_id" : "a" , "tag" : "a0" } > db.test.find({_id: "a\x00" }) { "_id" : "a" , "tag" : "a0" } > db.test.find({_id: "a\u0000" }) { "_id" : "a" , "tag" : "a0" } > db.test.find({_id: "a\0" }) { "_id" : "a" , "tag" : "a0" } > db.test.find({_id:{ "$regex" : "^a" }}) // here lost an item { "_id" : "a" , "tag" : "a0" } > db.test.find({_id:{ "$regex" : "^a\x00" }}) // correct { "_id" : "a" , "tag" : "a0" } > db.test.find({_id:{ "$regex" : "^a\u0000" }}) // correct { "_id" : "a" , "tag" : "a0" } >
    • Query 10 (02/22/16)

      There is a beginning of time bug in MongoDB's integration with the PCRE library that causes the string data stored in a document to be truncated at the first NUL byte when attempting to do pattern matching on it. This line is the cause of the issue because we will end up using the StringPiece(const char* str) constructor, which calls strlen(), and thus causes the pattern matching on the string data to stop at the first NUL byte.

      We should instead use the StringPiece(const char* offset, int len) constructor and pass e.valuestrsize() - 1 as the length of the string data.

      Additionally, it is worth noting that PCRE patterns cannot contain embedded NUL bytes. Instead, they need to be escaped as

      "\\000",
      "\\x00",
      

      etc. See my comment below for more details.


      Original description

      In version 2.4.7 and 3.0.3:

      The value contains an special characters '\u0000' (\x00, \0),use prefix search like "^a\u0000", but get an item which do not have the prefix "a\u0000" like "a".

      In version 1.8.3:

      when search with "^a", the item "a\u0000" is not in result set.

        1. server-18824.js
          0.5 kB
          Sam Kleinman

            Assignee:
            max.hirschhorn@mongodb.com Max Hirschhorn
            Reporter:
            ma6174 ma6174
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: