-
Type: New Feature
-
Resolution: Duplicate
-
Priority: Critical - P2
-
None
-
Affects Version/s: 10.0.2
-
Component/s: Schema
-
None
When trying to update documents, they get duplicated if they are indexed using ObjectIds.
This is caused by the lack of support of the BsonObjectId type, which is read as String. As a result, the documents are duplicated since their _ID does not correspond anymore (String != ObjectId).
Example:
Dataset<Row> data = sparkSession.read().format("mongodb").option("connection.uri", mongoUri).load(); data.write().format("mongodb").option("connection.uri", mongoUri).mode(SaveMode.Append).save();
- duplicates
-
SPARK-311 Support BsonTypes that aren't natively supported in Spark
- Closed