Uploaded image for project: 'Spark Connector'
  1. Spark Connector
  2. SPARK-434

Override columnar support in MongoScan

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Unknown Unknown
    • None
    • Affects Version/s: None
    • Component/s: Partitioners
    • None
    • Java Drivers

      > https://github.com/apache/spark/pull/42099

      > https://issues.apache.org/jira/browse/SPARK-44505

      > Previously, when a new DSv2 data source is implemented during planning, it will always call BatchScanExec:supportsColumnar which will in turn iterate over all input partitions to check if they support columnar or not.

      > When the planInputPartitions method is expensive this can be problematic. This patch adds an option to the Scan interface that allows specifying a default value. For backward compatibility the default value provided by the Scan interface is partition defined, but a Scan can change it accordingly.

       

      In short, mongo spark now uses the default value of `ColumnarSupportMode.PARTITION_DEFINED`, and `planInputPartitions` will be called repeatedly.

       

      After Spark 3.5.0, the Scan interface allows the implementation class (`com.mongodb.spark.sql.connector.read.MongoScan`) to override the `columnarSupportMode` method. And I think we can change it to `ColumnarSupportMode.UNSUPPORTED` to avoid unnecessary calls.

            Assignee:
            Unassigned Unassigned
            Reporter:
            szy917@outlook.com Zack Sun
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: