Uploaded image for project: 'Spark Connector'
  1. Spark Connector
  2. SPARK-158

Null value in String column is converted to the string "null"

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 2.2.2, 2.1.2
    • Affects Version/s: 2.2.0
    • Component/s: Schema
    • None
    • Environment:
      Spark 2.2.0, MongoDB 3.2

      Input collection:

      {"a" : 123, "b" : "abc", "c" : "xxx"} {"a" : 111, "b" : "aaa", "c" : "yyy"} {"a" : null, "b" : null, "c" : "zzz"}

      After loading this collection (with or without providing a schema) the value of column "b" for the third row is the string "null" instead of null.
      You can see it in various ways:

      • Applying collectAsList() and watching the content
      • testDataset.filter(col("b").isNull()).show() - prints an empty dataset.
      • Save the dataset to another collection

      Note that testDataset.filter(col("b").isNotNull()) returns a correct result.
      This problem does NOT occur when the column is numeric like in column "a".

      I debugged the code and found that in MapFunctions, function convertToDataType returns "null" instead of null when the column is of a String type and the element is BsonNull.

            Assignee:
            ross@mongodb.com Ross Lawley
            Reporter:
            ofrapa Ofra P
            Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: