-
Type: Bug
-
Resolution: Unresolved
-
Priority: Unknown
-
None
-
Affects Version/s: None
-
Component/s: Partitioners
-
None
-
Java Drivers
> https://github.com/apache/spark/pull/42099
> https://issues.apache.org/jira/browse/SPARK-44505
> Previously, when a new DSv2 data source is implemented during planning, it will always call BatchScanExec:supportsColumnar which will in turn iterate over all input partitions to check if they support columnar or not.
> When the planInputPartitions method is expensive this can be problematic. This patch adds an option to the Scan interface that allows specifying a default value. For backward compatibility the default value provided by the Scan interface is partition defined, but a Scan can change it accordingly.
In short, mongo spark now uses the default value of `ColumnarSupportMode.PARTITION_DEFINED`, and `planInputPartitions` will be called repeatedly.
After Spark 3.5.0, the Scan interface allows the implementation class (`com.mongodb.spark.sql.connector.read.MongoScan`) to override the `columnarSupportMode` method. And I think we can change it to `ColumnarSupportMode.UNSUPPORTED` to avoid unnecessary calls.