Loading...

XML

Word

Printable

JSON

Type: Epic
Resolution: Done
Priority: Major - P3
Fix Version/s: 1.3.0
Affects Version/s: None
Component/s: None
Labels:
- FY21Q2

Epic Name:
Support schema for Source connector
Documentation Changes:
None
Epic Status:
Done

Quarter:
None
% Issues Done Ratio:
9/9
Detailed Project Statuses:
Hide

Engineer(s): Ross

2020-08-25:

The last piece of work is waiting on a review from Durran and Jeff.

2020-07-28: Initial target of 2020-08-21 (6 Weeks)
This project captures most of the work for the Kafka 1.3.0 release.

Goals for the next month:

Support output to schema

Allow configuration of the Source Record Key

Support auto generated schema structs

Allow namespace based configuration overrides

Completed work:

Add support to output bson from the source
Show
Engineer(s): Ross 2020-08-25: The last piece of work is waiting on a review from Durran and Jeff. 2020-07-28: Initial target of 2020-08-21 (6 Weeks) This project captures most of the work for the Kafka 1.3.0 release. Goals for the next month: Support output to schema Allow configuration of the Source Record Key Support auto generated schema structs Allow namespace based configuration overrides Completed work: Add support to output bson from the source

Epic Summary

Summary

Currently the source connector converts all change stream documents to extended Json strings before publishing to a topic.
This requires users to use the org.apache.kafka.connect.storage.StringConverter converter which has some space / efficiency costs in Kafka (especially when it comes to storage). There are various other converters that allow the kafka topic subscriber to receive events in the format of their choice.
The MongoDB Source Connector should allow users to configure a number of converters such as io.confluent.connect.avro.AvroConverter, org.apache.kafka.connect.json.JsonConverter or org.apache.kafka.connect.converters.ByteArrayConverter.

Change stream events have a defined schema as such but contain a number of optional fields:

{
   _id : { <BSON Object> },
   "operationType" : "<operation>",
   "fullDocument" : { <document> },
   "ns" : {
      "db" : "<database>",
      "coll" : "<collection>"
   },
   "to" : {
      "db" : "<database>",
      "coll" : "<collection>"
   },
   "documentKey" : { "_id" : <value> },
   "updateDescription" : {
      "updatedFields" : { <document> },
      "removedFields" : [ "<field>", ... ]
   }
   "clusterTime" : <Timestamp>,
   "txnNumber" : <NumberLong>,
   "lsid" : {
      "id" : <UUID>,
      "uid" : <BinData>
   }
}

The main complexities will be deciding how to handle dynamic document data: fullDocument, documentKey and updateDescription.updatedFields.

A secondary aim will be to ensure users can access data from the ChangeStream in a way that is easily consumable outside of MongoDB: ~~KAFKA-99~~

is depended on by

KAFKA-48 Can't set custom message key

Closed

is duplicated by

KAFKA-33 Add Google Protocol buffers support

Closed

Assignee:: Ross Lawley
Reporter:: Scott L'Hommedieu (Inactive)
Votes:: 12 Vote for this issue
Watchers:: 15 Start watching this issue

Created:: Jul 09 2019 04:28:06 PM UTC
Updated:: Jul 19 2024 01:04:03 AM UTC
Resolved:: Sep 08 2020 03:37:55 PM UTC
Calendar Time:: 7 weeks, 4 days
Target start:: None
Target end:: None
Start date:: 13/Jul/20
End date:: 04/Sep/20

Details

Description

Summary

Attachments

Issue Links

Activity

People

Dates