How to reproduce:
- Create sample documents
db.getSiblingDB("demo").getCollection("bugReproduction").insertMany([ { "first_name": "Sample", "last_name": "Null User Id", "user_id": null }, { "first_name": "Sample", "last_name": "Has User Id", "user_id": "12345" }, { "first_name": "Sample", "last_name": "Unset User Id" } ]);
- Attempt to filter the documents based on the `user_id` field
Unable to find source-code formatter for language: kotlin. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml
// spark is a SparkContext object val df = spark .read() .format("mongodb") .option("spark.mongodb.connection.uri", "mongodb://localhost:27017") .option("spark.mongodb.database", "demo") .option("spark.mongodb.collection", "bugReproduction") .load()df.where(df.col("user_id").isNotNull()).show()
Expected output: a single row containing the "Sample Has User Id" document
Actual output: both "Sample Null User Id" and "Sample Has User Id" documents are included.
In Spark Connector 10.0.5, this example works as expected.
- related to
-
SPARK-376 Use the schema to automatically project fields
- Closed