Uploaded image for project: 'Java Driver'
  1. Java Driver
  2. JAVA-5575

Java Driver allows inserting invalid UTF-8 as string values

    • Type: Icon: Bug Bug
    • Resolution: Won't Fix
    • Priority: Icon: Minor - P4 Minor - P4
    • None
    • Affects Version/s: 5.1.2
    • Component/s: Codecs
    • None
    • Java Drivers
    • Hide

      1. What would you like to communicate to the user about this feature?
      2. Would you like the user to see examples of the syntax and/or executable code and its output?
      3. Which versions of the driver/connector does this apply to?

      Show
      1. What would you like to communicate to the user about this feature? 2. Would you like the user to see examples of the syntax and/or executable code and its output? 3. Which versions of the driver/connector does this apply to?

      Summary

      Inserting a BSON document with the Java String "\uD83E" as one of its fields values, results in the field containing invalid utf-8 data.

      Please provide the version of the driver. If applicable, please provide the MongoDB server version and topology (standalone, replica set, or sharded cluster).

      Driver version: 5.1.2

      MongoDB server version: 7.0.12

      How to Reproduce

      Run the following code:

       

      package org.example;
      
      import com.mongodb.ConnectionString;
      import com.mongodb.client.MongoClients;
      import org.bson.Document;
      
      public class Main {
        public static void main(String[] args) {
          var connectionString = new ConnectionString(System.getenv("MONGODB_URL"));
          try (var client = MongoClients.create(connectionString)) {
            var database = client.getDatabase(connectionString.getDatabase());
            var collection = database.getCollection("_utf-8-test");
            var document = new Document();
            document.append("test", "\uD83E");
            collection.insertOne(document);
          }
        }
      } 

       

      Additional Background

      If you try to read the document using the Rust driver you will get the following result:

       

      Raw document: RawDocumentBuf {
          data: "24000000075f69640066c355aaf91cda3cd766501402746573740004000000eda0be0000",
      }
      Error: Error { kind: BsonDeserialization(DeserializationError { message: "invalid utf-8 sequence of 1 bytes from index 0" }), labels: {}, wire_version: None, source: None } 

       

            Assignee:
            valentin.kovalenko@mongodb.com Valentin Kavalenka
            Reporter:
            asger@princh.com Asger Drewsen
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: