Uploaded image for project: 'Spark Connector'
  1. Spark Connector
  2. SPARK-412

inferSchema does not replace PLACEHOLDERs on nested structures

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 10.2.1
    • Affects Version/s: 10.1.0, 10.1.1, 10.2.0
    • Component/s: Schema
    • None
    • Not Needed
    • Hide

      1. What would you like to communicate to the user about this feature?
      2. Would you like the user to see examples of the syntax and/or executable code and its output?
      3. Which versions of the driver/connector does this apply to?

      Show
      1. What would you like to communicate to the user about this feature? 2. Would you like the user to see examples of the syntax and/or executable code and its output? 3. Which versions of the driver/connector does this apply to?

       

      Customer is receiving the following error when trying to read a collection: 

       

      Py4JJavaError: An error occurred while calling o6680.load. : scala.MatchError: com.mongodb.spark.sql.connector.schema.InferSchema$1@55392bd7 (of class com.mongodb.spark.sql.connector.schema.InferSchema$1) at org.apache.spark.sql.catalyst.encoders.RowEncoder$.encoderForDataType(RowEncoder.scala:80) at org.apache.spark.sql.catalyst.encoders.RowEncoder$.encoderForDataType(RowEncoder.scala:116) at org.apache.spark.sql.catalyst.encoders.RowEncoder$.$anonfun$encoderForDataType$2(RowEncoder.scala:129) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) at scala.collection.TraversableLike.map(TraversableLike.scala:286) at scala.collection.TraversableLike.map$(TraversableLike.scala:279) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198) at org.apache.spark.sql.catalyst.encoders.RowEncoder$.encoderForDataType(RowEncoder.scala:126) at org.apache.spark.sql.catalyst.encoders.RowEncoder$.$anonfun$encoderForDataType$2(RowEncoder.scala:129) 

       

      It appears the structure involves an empty array nested within other elements. 

      Reviewing your code here, it appears it is not recursively walking the structure to replace all the placeholders, and instead is only replacing on the first level.

      Suggest adding a test case to InferSchemaTest.java

      assertAll(() ->assertEquals( createStructType(singletonList(createStructField( "arrayField", createArrayType(createArrayType(DataTypes.StringType, true), true)))), InferSchema.inferSchema( singletonList(BsonDocument.parse("{arrayField: [[]]}")), READ_CONFIG)));

      If you inspect the failure on this test, you'll see that the comparison fails due to mismatch String vs. the placeholder object, indicating the placeholder never was replaced.  
       
       

            Assignee:
            ross@mongodb.com Ross Lawley
            Reporter:
            michael.verrilli@databricks.com Michael Verrilli
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: