-
Type: Task
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Environment:Spark Connector
Customer is experiencing a duplicate key exception when attempting to execute MongoSpark.save(RDD, writeConfig)
def save[D: ClassTag](rdd: RDD[D], writeConfig: WriteConfig): Unit)and encountering documents which already exist in the target collection (same _id)
Looking at MongoSpark.scala, it appears that there is a code path for
def save[D](dataset: Dataset[D], writeConfig: WriteConfig): Unit
that checks for the option replaceDocument This check isn't contained in the RDD code path.
Can this be added? Is there a specific reason this is disallowed? Are there other workarounds for this?
- is duplicated by
-
SPARK-280 Enhance save(RDD) to avoid duplicate key exception
- Closed