Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Unknown
Fix Version/s: None
Affects Version/s: None
Component/s: Partitioners
Labels:
None

Assigned Teams:

Java Drivers

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

> https://github.com/apache/spark/pull/42099

> https://issues.apache.org/jira/browse/SPARK-44505

> Previously, when a new DSv2 data source is implemented during planning, it will always call BatchScanExec:supportsColumnar which will in turn iterate over all input partitions to check if they support columnar or not.

> When the planInputPartitions method is expensive this can be problematic. This patch adds an option to the Scan interface that allows specifying a default value. For backward compatibility the default value provided by the Scan interface is partition defined, but a Scan can change it accordingly.

In short, mongo spark now uses the default value of `ColumnarSupportMode.PARTITION_DEFINED`, and `planInputPartitions` will be called repeatedly.

After Spark 3.5.0, the Scan interface allows the implementation class (`com.mongodb.spark.sql.connector.read.MongoScan`) to override the `columnarSupportMode` method. And I think we can change it to `ColumnarSupportMode.UNSUPPORTED` to avoid unnecessary calls.

Assignee:: Ross Lawley
Reporter:: Zack Sun
Reviewers:: None
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Sep 02 2024 05:13:32 AM UTC
Updated:: May 07 2025 03:27:35 PM UTC

Details

Description

Attachments

Activity

People

Dates