Uploaded image for project: 'Kafka Connector'
  1. Kafka Connector
  2. KAFKA-40

Support schema for Source connector

    • Type: Icon: Epic Epic
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 1.3.0
    • Affects Version/s: None
    • Component/s: None
    • Done
    • Support schema for Source connector
    • Hide

      Engineer(s): Ross

      2020-08-25:

      • The last piece of work is waiting on a review from Durran and Jeff.

      2020-07-28: Initial target of 2020-08-21 (6 Weeks)
      This project captures most of the work for the Kafka 1.3.0 release.

      Goals for the next month:

      • Support output to schema
      • Allow configuration of the Source Record Key
      • Support auto generated schema structs
      • Allow namespace based configuration overrides

      Completed work:

      • Add support to output bson from the source

      Show
      Engineer(s): Ross 2020-08-25: The last piece of work is waiting on a review from Durran and Jeff. 2020-07-28: Initial target of 2020-08-21 (6 Weeks) This project captures most of the work for the Kafka 1.3.0 release. Goals for the next month: Support output to schema Allow configuration of the Source Record Key Support auto generated schema structs Allow namespace based configuration overrides Completed work: Add support to output bson from the source

      Epic Summary

      Summary

      Currently the source connector converts all change stream documents to extended Json strings before publishing to a topic. 
      This requires users to use the org.apache.kafka.connect.storage.StringConverter converter which has some space / efficiency costs in Kafka (especially when it comes to storage).  There are various other converters that allow the kafka topic subscriber to receive events in the format of their choice.
      The MongoDB Source Connector should allow users to configure a number of converters such as io.confluent.connect.avro.AvroConverter, org.apache.kafka.connect.json.JsonConverter or org.apache.kafka.connect.converters.ByteArrayConverter.

      Change stream events have a defined schema as such but contain a number of optional fields: 

      {
         _id : { <BSON Object> },
         "operationType" : "<operation>",
         "fullDocument" : { <document> },
         "ns" : {
            "db" : "<database>",
            "coll" : "<collection>"
         },
         "to" : {
            "db" : "<database>",
            "coll" : "<collection>"
         },
         "documentKey" : { "_id" : <value> },
         "updateDescription" : {
            "updatedFields" : { <document> },
            "removedFields" : [ "<field>", ... ]
         }
         "clusterTime" : <Timestamp>,
         "txnNumber" : <NumberLong>,
         "lsid" : {
            "id" : <UUID>,
            "uid" : <BinData>
         }
      } 

       

       The main complexities will be deciding how to handle dynamic document data:  fullDocument, documentKey and updateDescription.updatedFields.

       

      A secondary aim will be to ensure users can access data from the ChangeStream in a way that is easily consumable outside of MongoDB: KAFKA-99

            Assignee:
            ross@mongodb.com Ross Lawley
            Reporter:
            scott.lhommedieu@mongodb.com Scott L'Hommedieu (Inactive)
            Votes:
            12 Vote for this issue
            Watchers:
            15 Start watching this issue

              Created:
              Updated:
              Resolved:
              7 weeks, 4 days