-
Type: Bug
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: 4.4.3, 4.9.0-alpha4
-
Component/s: None
-
None
-
ALL
-
-
Query Execution 2021-04-19
When a unique index is defined on a collection, and data is inserted that contains duplicates, the server includes an excerpt of the duplicating data into the error message.
When the data being inserted is multi-byte utf-8, it appears that the server truncates the data without regard for utf-8 characters. When the truncated data is incorporated into the error message, the entire string is no longer valid utf-8.
Test code in Ruby:
require 'mongo' client = Mongo::Client.new(['localhost:14400']) client['foo'].drop client['foo'].indexes.create_one({k: 1}, unique: true) rep = '(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻' client['foo'].insert_one(k: rep*10) client['foo'].insert_one(k: rep*10)
The error message returned is:
E11000 duplicate key error collection: admin.foo index: k_1 dup key: { k: "(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□�..." }
The libbson utf-8 validator that the Ruby driver uses complains about it thusly:
/home/w/.rbenv/versions/2.7.2/lib/ruby/gems/2.7.0/gems/bson-4.12.0/lib/bson/hash.rb:111:in `get_hash': String E11000 duplicate key error collection: admin.foo index: k_1 dup key: { k: "(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□�..." } is not valid UTF-8: bogus high bits for continuation byte (EncodingError)
The error message is returned as a BSON string, which according to my understanding of http://bsonspec.org/spec.html must contain valid utf-8 characters.
This was reported in https://jira.mongodb.org/browse/RUBY-2560. I verified against 2.6.12, 4.4.3 and 4.9.0-alpha5 servers.
- causes
-
RUBY-2560 EncodingError raised when server returns invalid UTF-8 in error messages derived from user input
- Backlog
- duplicates
-
SERVER-24007 Server can return invalid UTF8 for error messages due to truncation in the middle of a code point
- Backlog