col.isNotNull() does not work for fields with null values

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Fixed
    • Priority: Unknown
    • 10.2.1
    • Affects Version/s: 10.1.0, 10.1.1, 10.2.0
    • Component/s: API, Schema
    • Not Needed
    • Hide

      1. What would you like to communicate to the user about this feature?
      2. Would you like the user to see examples of the syntax and/or executable code and its output?
      3. Which versions of the driver/connector does this apply to?

      Show
      1. What would you like to communicate to the user about this feature? 2. Would you like the user to see examples of the syntax and/or executable code and its output? 3. Which versions of the driver/connector does this apply to?
    • None
    • None
    • None
    • None
    • None
    • None

      How to reproduce:

      1. Create sample documents
        db.getSiblingDB("demo").getCollection("bugReproduction").insertMany([
            {
                "first_name": "Sample",
                "last_name": "Null User Id",
                "user_id": null
            },
            {
                "first_name": "Sample",
                "last_name": "Has User Id",
                "user_id": "12345"
            },
            {
                "first_name": "Sample",
                "last_name": "Unset User Id"
            }
        ]); 
      2. Attempt to filter the documents based on the `user_id` field
        Unable to find source-code formatter for language: kotlin. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml
        // spark is a SparkContext object
        val df = spark
          .read()
          .format("mongodb")
          .option("spark.mongodb.connection.uri", "mongodb://localhost:27017")
          .option("spark.mongodb.database", "demo")
          .option("spark.mongodb.collection", "bugReproduction")
          .load()df.where(df.col("user_id").isNotNull()).show()

      Expected output: a single row containing the "Sample Has User Id" document

      Actual output: both "Sample Null User Id" and "Sample Has User Id" documents are included.

      In Spark Connector 10.0.5, this example works as expected.

            Assignee:
            Ross Lawley
            Reporter:
            Nathan Strong
            None
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: