Uploaded image for project: 'Spark Connector'
  1. Spark Connector
  2. SPARK-157

Exception when iterating over collection with single record

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 2.2.2, 2.1.2
    • Affects Version/s: 2.2.1
    • Component/s: Partitioners
    • None

      Using the Mongo Spark connector, we received the following exception when attempting to read in a collection with a single record.

      We configured the ReadConfig to use the MongoPaginateByCountPartitioner

      {{...
      Caused by: java.util.NoSuchElementException: next on empty iterator
      at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)
      at scala.collection.Iterator$$anon$2.next(Iterator.scala:37)
      at scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:63)
      at scala.collection.IterableLike$class.head(IterableLike.scala:107)
      at scala.collection.mutable.ArrayOps$ofRef.scala$collection$IndexedSeqOptimized$$super$head(ArrayOps.scala:186)
      at scala.collection.IndexedSeqOptimized$class.head(IndexedSeqOptimized.scala:126)
      at scala.collection.mutable.ArrayOps$ofRef.head(ArrayOps.scala:186)
      at com.mongodb.spark.rdd.partitioner.PartitionerHelper$.setLastBoundaryToLessThanOrEqualTo(PartitionerHelper.scala:127)
      at com.mongodb.spark.rdd.partitioner.MongoPaginateByCountPartitioner.partitions(MongoPaginateByCountPartitioner.scala:85)
      at com.mongodb.spark.rdd.MongoRDD.getPartitions(MongoRDD.scala:137)
      at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
      at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
      at scala.Option.getOrElse(Option.scala:121)
      at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
      at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
      at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
      at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
      at scala.Option.getOrElse(Option.scala:121)
      at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
      at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
      at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
      at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
      at scala.Option.getOrElse(Option.scala:121)
      at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
      at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
      at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
      at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
      at scala.Option.getOrElse(Option.scala:121)
      at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
      at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:185)
      at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:194)
      at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:80)
      at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:75)
      at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:98)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:124)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:119)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$3.apply(SparkPlan.scala:153)
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
      at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:150)
      at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:119)
      at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:109)
      at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:109)
      at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:617)
      at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:617)
      at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:80)
      at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:99)
      at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:617)
      at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:242)
      at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:220)
      at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:516)
      ...}}

            Assignee:
            ross@mongodb.com Ross Lawley
            Reporter:
            erik@shopximity.com Erik Dreyer
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: