Uploaded image for project: 'Spark Connector'
  1. Spark Connector
  2. SPARK-257

Spark query can skip the invalid data type from mongo

    • Type: Icon: New Feature New Feature
    • Resolution: Fixed
    • Priority: Icon: Minor - P4 Minor - P4
    • 10.0.0
    • Affects Version/s: 2.4.1
    • Component/s: Reads
    • None
    • Environment:
      Spark 2.4.3 + MongoDB 4.0

      We create a Spark-Sql table using com.mongodb.spark.sql, then we query the Spark-Sql table with some filter which will throw a issue: Cannot cast STRING into a IntegerType (value: BsonString{value='0'}). 

      We looking into the code and found that the issue is because there is one column in mongodb which has different data type. 

      For example, there is on column: price the value is NumberInt("0"), but in anther recored its value is "0" which is String. 

      So now we must change the raw data in MongoDB then continue my query from Spark. 

      Because mongodb is Schemaless so whether can add on function in the connector which can skip the value which data type is invalid. 

      For example add on configuration: skip.invalid.datatype in the connector which default is false. 
       

            Assignee:
            ross@mongodb.com Ross Lawley
            Reporter:
            zgcsky08@163.com zhou bill
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: