-
Type: Task
-
Resolution: Fixed
-
Priority: Unknown
-
Affects Version/s: None
-
Component/s: None
-
None
-
Java Drivers
-
Not Needed
-
TLDR: A user provided a fix for an issue they discovered. Review, test and Merge the code if valid.
USER REPORTED ISSUE:
MongoDB Spark Connector has a bug in the method: isJsonObjectOrArray
File: RowToBsonDocumentConverter.java
Method: isJsonObjectOrArray (Line: 221)
Ref: https://github.com/mongodb/mongo-spark/blob/main/src/main/java/com/mongodb/spark/sql/connector/schema/RowToBsonDocumentConverter.java?#L221 1
Issue: Code always assumes string is not empty and access index 0/1. So when data has empty string ‘’ , get error: Can not convert to Bson, Index out of range
Suggested Fix: Its just bool method checking if value should be converted to BSON or not. Just returning false on empty strings will do?
OR provide easier way to just convert ONLY specified column to convert to BSON. We only need to use this for one column - Id to ObjectId - But as its not supported, we have to use it as top-level option that applies to all Object/Arrays.
USER PROVIDED FIX
Below fix I tried on cloned repo and tested - Seems to fix this issue:
Added first 3 lines below to method: isJsonObjectOrArray
Is it possible to get this fix applied to mongo-spark connector repo and get updated .jar file?
{{private static boolean isJsonObjectOrArray(final String data) { if (data == null || data.isEmpty() || data.length() < 2) { return false; } char firstChar = data.charAt(0); char lastChar = data.charAt(data.length() - 1); return (firstChar == JSON_OBJECT_START && lastChar == JSON_OBJECT_END)