Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-55442

Server returns invalid utf-8 in duplicate key error message after truncating user input

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 4.4.3, 4.9.0-alpha4
    • Component/s: None
    • None

      When a unique index is defined on a collection, and data is inserted that contains duplicates, the server includes an excerpt of the duplicating data into the error message.

      When the data being inserted is multi-byte utf-8, it appears that the server truncates the data without regard for utf-8 characters. When the truncated data is incorporated into the error message, the entire string is no longer valid utf-8.

      Test code in Ruby:

      require 'mongo'
      
      client = Mongo::Client.new(['localhost:14400'])
      
      client['foo'].drop
      client['foo'].indexes.create_one({k: 1}, unique: true)
      
      rep = '(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻'
      
      client['foo'].insert_one(k: rep*10)
      client['foo'].insert_one(k: rep*10)
      

      The error message returned is:

      E11000 duplicate key error collection: admin.foo index: k_1 dup key: { k: "(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□�..." }
      

      The libbson utf-8 validator that the Ruby driver uses complains about it thusly:

      /home/w/.rbenv/versions/2.7.2/lib/ruby/gems/2.7.0/gems/bson-4.12.0/lib/bson/hash.rb:111:in `get_hash': String E11000 duplicate key error collection: admin.foo index: k_1 dup key: { k: "(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□�..." } is not valid UTF-8: bogus high bits for continuation byte (EncodingError)
      

      The error message is returned as a BSON string, which according to my understanding of http://bsonspec.org/spec.html must contain valid utf-8 characters.

      This was reported in https://jira.mongodb.org/browse/RUBY-2560. I verified against 2.6.12, 4.4.3 and 4.9.0-alpha5 servers.

            Assignee:
            david.storch@mongodb.com David Storch
            Reporter:
            oleg.pudeyev@mongodb.com Oleg Pudeyev (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: