-
Type: Bug
-
Resolution: Won't Fix
-
Priority: Minor - P4
-
None
-
Affects Version/s: 5.1.2
-
Component/s: Codecs
-
None
-
Java Drivers
Summary
Inserting a BSON document with the Java String "\uD83E" as one of its fields values, results in the field containing invalid utf-8 data.
Please provide the version of the driver. If applicable, please provide the MongoDB server version and topology (standalone, replica set, or sharded cluster).
Driver version: 5.1.2
MongoDB server version: 7.0.12
How to Reproduce
Run the following code:
package org.example; import com.mongodb.ConnectionString; import com.mongodb.client.MongoClients; import org.bson.Document; public class Main { public static void main(String[] args) { var connectionString = new ConnectionString(System.getenv("MONGODB_URL")); try (var client = MongoClients.create(connectionString)) { var database = client.getDatabase(connectionString.getDatabase()); var collection = database.getCollection("_utf-8-test"); var document = new Document(); document.append("test", "\uD83E"); collection.insertOne(document); } } }
Additional Background
If you try to read the document using the Rust driver you will get the following result:
Raw document: RawDocumentBuf { data: "24000000075f69640066c355aaf91cda3cd766501402746573740004000000eda0be0000", } Error: Error { kind: BsonDeserialization(DeserializationError { message: "invalid utf-8 sequence of 1 bytes from index 0" }), labels: {}, wire_version: None, source: None }
- duplicates
-
JAVA-4431 Driver allows inserting invalid UTF-8 strings
- Closed
- related to
-
DRIVERS-2008 Default to lossy/replacement behavior when decoding UTF-8 in writeErrors
- Backlog
-
DRIVERS-1936 Drivers should have option to disable UTF-8 validation for BSON strings
- Backlog