Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Fixed
Priority: Unknown
Fix Version/s: 10.1.0
Affects Version/s: None
Component/s: Source
Labels:
None

Quarter:
- FY23Q4
Case:

Documentation Changes:
Needed
Documentation Changes Summary:

Hide

Adds two new configurations:

1. spark.mongodb.read.outputExtendedJson=<true/false>
2. spark.mongodb.write.convertJson=<true/false>

spark.mongodb.read.outputExtendedJson=true should be used to ensure round tripping of all Bson datatypes.
spark.mongodb.write.convertJson=true should be used to process strings that may include JSON (including extended JSON).

Note: To keep backwards compatibility both these new settings default to false.

Show
Adds two new configurations: 1. spark.mongodb.read.outputExtendedJson=<true/false> 2. spark.mongodb.write.convertJson=<true/false> spark.mongodb.read.outputExtendedJson=true should be used to ensure round tripping of all Bson datatypes. spark.mongodb.write.convertJson=true should be used to process strings that may include JSON (including extended JSON). Note: To keep backwards compatibility both these new settings default to false.

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

Adds two new configurations:

1. spark.mongodb.read.outputExtendedJson=<true/false>
2. spark.mongodb.write.convertJson=<true/false>

spark.mongodb.read.outputExtendedJson=true should be used to ensure round tripping of all Bson datatypes.
spark.mongodb.write.convertJson=true should be used to process strings that may include JSON (including extended JSON).

Note: To keep backwards compatibility both these new settings default to false.

----------
Summary of changes:

Spark has a fixed type system: https://spark.apache.org/docs/latest/sql-ref-datatypes.html

As such some of the Bson types are not supported by Spark - eg: ObjectId.

This change allows users to use Spark and support all data types.

spark.mongodb.read.outputExtendedJson
If true when reading into the data into Spark it converts unsupported types into extended json strings.
If false will use the original relaxed Json format for unsupported types.* see relaxed json formatting.

When writing from Spark into MongoDB it changes strings

spark.mongodb.write.convertJson
If true parse the string and convert to bson type if extended json
If false do nothing

Relaxed Json formatting.

Any Bson type can be converted to String. Relaxed conversion uses the Relaxed Extended JSON Format with a few extra usability changes for certain types:

Binary - Base64 string.
DateTime - Iso formatted string
Decimal128 - String value
ObjectId - Hex string
Symbol - String value

causes

SPARK-388 Datatype overwritten while insertion into Mongo on using mongo-spark-connector 10.1.0v

Closed

is duplicated by

SPARK-352 ObjectIds are not supported BsonTypes and can lead to duplications

Closed

SPARK-326 Support all bson types in the new connector

Closed

Assignee:: Ross Lawley
Reporter:: Ross Lawley
Reviewers:: None
Votes:: 0 Vote for this issue
Watchers:: 8 Start watching this issue

Created:: Sep 17 2021 10:19:29 AM UTC
Updated:: Oct 28 2023 10:34:14 AM UTC
Resolved:: Dec 12 2022 10:21:08 AM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates