Uploaded image for project: 'Spark Connector'
  1. Spark Connector
  2. SPARK-311

Support BsonTypes that aren't natively supported in Spark

    • Type: Icon: Improvement Improvement
    • Resolution: Fixed
    • Priority: Icon: Unknown Unknown
    • 10.1.0
    • Affects Version/s: None
    • Component/s: Source
    • None
    • Needed
    • Hide

      Adds two new configurations:

      1. spark.mongodb.read.outputExtendedJson=<true/false>
      2. spark.mongodb.write.convertJson=<true/false>

      spark.mongodb.read.outputExtendedJson=true should be used to ensure round tripping of all Bson datatypes.
      spark.mongodb.write.convertJson=true should be used to process strings that may include JSON (including extended JSON).

      Note: To keep backwards compatibility both these new settings default to false.

      Show
      Adds two new configurations: 1. spark.mongodb.read.outputExtendedJson=<true/false> 2. spark.mongodb.write.convertJson=<true/false> spark.mongodb.read.outputExtendedJson=true should be used to ensure round tripping of all Bson datatypes. spark.mongodb.write.convertJson=true should be used to process strings that may include JSON (including extended JSON). Note: To keep backwards compatibility both these new settings default to false.

      Adds two new configurations:

      1. spark.mongodb.read.outputExtendedJson=<true/false>
      2. spark.mongodb.write.convertJson=<true/false>

      spark.mongodb.read.outputExtendedJson=true should be used to ensure round tripping of all Bson datatypes.
      spark.mongodb.write.convertJson=true should be used to process strings that may include JSON (including extended JSON).

      Note: To keep backwards compatibility both these new settings default to false.

      ----------
      Summary of changes:

      Spark has a fixed type system: https://spark.apache.org/docs/latest/sql-ref-datatypes.html

      As such some of the Bson types are not supported by Spark - eg: ObjectId.

      This change allows users to use Spark and support all data types.

      spark.mongodb.read.outputExtendedJson
      If true when reading into the data into Spark it converts unsupported types into extended json strings.
      If false will use the original relaxed Json format for unsupported types.* see relaxed json formatting.

      When writing from Spark into MongoDB it changes strings

      spark.mongodb.write.convertJson
      If true parse the string and convert to bson type if extended json
      If false do nothing

      Relaxed Json formatting.

      Any Bson type can be converted to String. Relaxed conversion uses the Relaxed Extended JSON Format with a few extra usability changes for certain types:

      • Binary - Base64 string.
      • DateTime - Iso formatted string
      • Decimal128 - String value
      • ObjectId - Hex string
      • Symbol - String value

            Assignee:
            ross@mongodb.com Ross Lawley
            Reporter:
            ross@mongodb.com Ross Lawley
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: