Customer is receiving the following error when trying to read a collection:
Py4JJavaError: An error occurred while calling o6680.load. : scala.MatchError: com.mongodb.spark.sql.connector.schema.InferSchema$1@55392bd7 (of class com.mongodb.spark.sql.connector.schema.InferSchema$1) at org.apache.spark.sql.catalyst.encoders.RowEncoder$.encoderForDataType(RowEncoder.scala:80) at org.apache.spark.sql.catalyst.encoders.RowEncoder$.encoderForDataType(RowEncoder.scala:116) at org.apache.spark.sql.catalyst.encoders.RowEncoder$.$anonfun$encoderForDataType$2(RowEncoder.scala:129) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) at scala.collection.TraversableLike.map(TraversableLike.scala:286) at scala.collection.TraversableLike.map$(TraversableLike.scala:279) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198) at org.apache.spark.sql.catalyst.encoders.RowEncoder$.encoderForDataType(RowEncoder.scala:126) at org.apache.spark.sql.catalyst.encoders.RowEncoder$.$anonfun$encoderForDataType$2(RowEncoder.scala:129)
It appears the structure involves an empty array nested within other elements.
Reviewing your code here, it appears it is not recursively walking the structure to replace all the placeholders, and instead is only replacing on the first level.
Suggest adding a test case to InferSchemaTest.java:
assertAll(() ->assertEquals( createStructType(singletonList(createStructField( "arrayField", createArrayType(createArrayType(DataTypes.StringType, true), true)))), InferSchema.inferSchema( singletonList(BsonDocument.parse("{arrayField: [[]]}")), READ_CONFIG)));
If you inspect the failure on this test, you'll see that the comparison fails due to mismatch String vs. the placeholder object, indicating the placeholder never was replaced.