Adds two new configurations:
1. spark.mongodb.read.outputExtendedJson=<true/false>
2. spark.mongodb.write.convertJson=<true/false>
spark.mongodb.read.outputExtendedJson=true should be used to ensure round tripping of all Bson datatypes.
spark.mongodb.write.convertJson=true should be used to process strings that may include JSON (including extended JSON).
Note: To keep backwards compatibility both these new settings default to false.
----------
Summary of changes:
Spark has a fixed type system: https://spark.apache.org/docs/latest/sql-ref-datatypes.html
As such some of the Bson types are not supported by Spark - eg: ObjectId.
This change allows users to use Spark and support all data types.
spark.mongodb.read.outputExtendedJson
If true when reading into the data into Spark it converts unsupported types into extended json strings.
If false will use the original relaxed Json format for unsupported types.* see relaxed json formatting.
When writing from Spark into MongoDB it changes strings
spark.mongodb.write.convertJson
If true parse the string and convert to bson type if extended json
If false do nothing
Relaxed Json formatting.
Any Bson type can be converted to String. Relaxed conversion uses the Relaxed Extended JSON Format with a few extra usability changes for certain types:
- Binary - Base64 string.
- DateTime - Iso formatted string
- Decimal128 - String value
- ObjectId - Hex string
- Symbol - String value