Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Works as Designed
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Case:

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

Schema inference uses the base type when determining the schema for arrays. So when sourcing the following document structure:

{
    "L1": {
      "L2": {
        "L3": [ {"V2": {"K1": 0},"K1": 0},  {"V5": ["A1", "A2"], "V11": 1} ]
      }
    }
  }

The type of L3 is Array with a value type of Schema.STRING:

  "fullDocument": {
    "_id": "5fb67d988f8729ab566e4f6b",
    "L1": {
      "L2": {
        "L3": [ "{\"V2\": {\"K1\": 0}, \"K1\": 0}","{\"V5\": [\"A1\", \"A2\"], \"V11\": 1}" ]
      }
    }
  },

Configuration:

{
  "key.converter.schemas.enable": "false",
  "value.converter.schemas.enable": "false",
  "connector.class": "com.mongodb.kafka.connect.MongoSourceConnector",
  "tasks.max": "1",
  "key.converter": "org.apache.kafka.connect.storage.StringConverter",
  "value.converter": "org.apache.kafka.connect.json.JsonConverter",
  "errors.log.enable": "true",
  "errors.log.include.messages": "true",
  "connection.uri":"CONECTIONSTRING",
  "database": "testdb",
  "collection": "testcol",
  "topic.prefix": "test-prefix",
  "output.format.key": "json",
  "output.format.value": "schema",
  "output.schema.infer.value": "true",
  "output.json.formatter": "com.mongodb.kafka.connect.source.json.formatter.SimplifiedJson",
  "copy.existing": "true"
}

Json schemas do allow variable object types for Structs and Arrays: Array Compatibility. So when output.schema.infer.value=true then when providing schema for Json with schema then there should be no use of a Base type. Note this will require an extra configuration eg: "output.schema.infer.compatibility:[none|all]" - default to all compatibility to keep the current behaviour.

For reference see:
https://developer.mongodb.com/community/forums/t/array-of-objects-become-array-of-string-during-upload-to-kafka/11509/3

is related to

KAFKA-343 Improve schema inference for documents nested in arrays

Closed

Assignee:: Ross Lawley
Reporter:: Robert Walters (Inactive)
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Nov 19 2020 02:18:53 PM UTC
Updated:: Oct 27 2023 11:54:14 AM UTC
Resolved:: Jan 03 2023 04:16:16 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates