-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: 10.1.0
-
Component/s: None
-
None
-
Not Needed
-
Hello,
TL;DR when loading a MongoDB collection in Spark using Mongo Spark connector 10.1.0 or 10.1.1, I cannot filter on columns for which the column name contains one or more dashes (e.g. "my-column-name"). The filtering operation always leaves me with no data at all. This was working fine with Mongo Spark connector 10.0.5.
We recently moved to Mongo Spark connector 10.1.0 and had to revert to 10.0.5 as it caused our Spark job to behave abnormally.
We're coding in Java 17, using Scala 2.12 & Spark 3.3.1 and reading data from MongoDB collections. To summarise our investigations, we observe the following behaviour (omitting unnecessary details like the "_id" field):
var data = sparkSession.read() .format("mongodb") // ... .load(); data.show(); // The data we're using... // +----+--------+ // |okay|not-okay| // +----+--------+ // |null|foo | // |bar | null| // +----+--------+ data.filter(col("okay").isNotNull()).show(); // This filters data as expected... // +----+--------+ // |okay|not-okay| // +----+--------+ // |bar | null| // +----+--------+ data.filter(col("not-okay").isNotNull()).show(); // But not if the column name has a dash! // +----+--------+ // |okay|not-okay| // +----+--------+ // +----+--------+
The behaviour didn't change whether I was using "filter" or "where". Similarly, expressing the condition as a column or a string always leaves me with no data. Renaming the column using "withColumnRenamed" doesn't help. Creating a new column using "withColumn" does help, so it feels like the original column is somehow corrupted. Also, using "coalesce(col("not-okay"), lit(false))" does help.
We have many MongoDB collections in which field keys contains one or more dashes and it is really impractical to change the collections (we don't own them) or hack our Spark jobs to avoid this.
I kind of feel helpless after having spent a couple of days to figure out what was going on with our Spark jobs. I hope this issue will help!
Regards,
Sébastien
- related to
-
SPARK-432 Broken reading filter on structures field with dash character
- Closed