If an error message truncates a string in such a way that it is no longer valid UTF-8, instead of raising an Mongo::Error::OperationFailure (or other exception) an EncodingError gets raised. This happens when an error message gets truncated on a byte in the middle of a UTF-8 character.
Example:
class MyDocument include Mongoid::Document field :name, type: String index({name: 1}, {unique: true}) end MyDocument.create_indexes MyDocument.collection.insert_one({name: "(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻"}) # this raises # EncodingError (String E11000 duplicate key error collection: my_db.my_documents index: name_1 dup key: { : "(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□?..." } is not valid UTF-8: bogus high bits for continuation byte) # the truncation fell in the middle of a ° character MyDocument.collection.insert_one({name: "(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻"}) MyDocument.collection.insert_one({name: "a(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻"}) # this raises # Mongo::Error::OperationFailure (E11000 duplicate key error collection: my_db.my_documents index: name_1 dup key: { : "a(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□..." } (11000) (on 127.0.0.1:27017, legacy retry, attempt 1)) # which is expected MyDocument.collection.insert_one({name: "a(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻"})
- is caused by
-
SERVER-24007 Server can return invalid UTF8 for error messages due to truncation in the middle of a code point
- Backlog
-
SERVER-55442 Server returns invalid utf-8 in duplicate key error message after truncating user input
- Closed
- is related to
-
DRIVERS-2008 Default to lossy/replacement behavior when decoding UTF-8 in writeErrors
- Backlog
- links to