Uploaded image for project: 'Spark Connector'
  1. Spark Connector
  2. SPARK-279

Duplicate key exception when using Spark Connector save with RDD

    • Type: Icon: Task Task
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Environment:
      Spark Connector

      Customer is experiencing a duplicate key exception when attempting to execute MongoSpark.save(RDD, writeConfig)

      def save[D: ClassTag](rdd: RDD[D], writeConfig: WriteConfig): Unit)and encountering documents which already exist in the target collection (same _id)

      Looking at MongoSpark.scala, it appears that there is a code path for

      def save[D](dataset: Dataset[D], writeConfig: WriteConfig): Unit

      that checks for the option replaceDocument This check isn't contained in the RDD code path.

      Can this be added? Is there a specific reason this is disallowed? Are there other workarounds for this?

            Assignee:
            ross@mongodb.com Ross Lawley
            Reporter:
            steffan.mejia@mongodb.com Steffan Mejia
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: