-
Type: New Feature
-
Resolution: Fixed
-
Priority: Minor - P4
-
Affects Version/s: 3.0.1
-
Needed
-
Summary
MongoTypeConversionException is thrown during spark read in presence of bad/corrupt fields in large collection. Adding support for modes like Permissive or DropMalformed as Mongo spark options will help in successful completion of MongoSpark read.
Motivation
Who is the affected end user?
Big data management companies
How does this affect the end user?
Dataframe read operation breaks in presence of corrupt records.
How likely is it that this problem or use case will occur?
Any huge Mongo collection holding unstructured data where scanning entire collection to infer schema results in performance overhead.
Whenever explicit schema is passed during spark dataframe read.
If the problem does occur, what are the consequences and how severe are they?
Failover - Spark read fails with MongoTypeConversionException even in presence of one corrupt record in collection of 1000x rows.
Is this issue urgent?
Yes, breaks in read operation will be prevented.
Is this ticket required by a downstream team?
Needed by e.g. Atlas, Shell, Compass?
Is this ticket only for tests?
No
Cast of Characters
Engineering Lead:
Document Author:
POCers:
Product Owner:
Program Manager:
Stakeholders: