Override columnar support in MongoScan

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Unknown
    • None
    • Affects Version/s: None
    • Component/s: Partitioners
    • None
    • Java Drivers
    • None
    • None
    • None
    • None
    • None
    • None

      > https://github.com/apache/spark/pull/42099

      > https://issues.apache.org/jira/browse/SPARK-44505

      > Previously, when a new DSv2 data source is implemented during planning, it will always call BatchScanExec:supportsColumnar which will in turn iterate over all input partitions to check if they support columnar or not.

      > When the planInputPartitions method is expensive this can be problematic. This patch adds an option to the Scan interface that allows specifying a default value. For backward compatibility the default value provided by the Scan interface is partition defined, but a Scan can change it accordingly.

       

      In short, mongo spark now uses the default value of `ColumnarSupportMode.PARTITION_DEFINED`, and `planInputPartitions` will be called repeatedly.

       

      After Spark 3.5.0, the Scan interface allows the implementation class (`com.mongodb.spark.sql.connector.read.MongoScan`) to override the `columnarSupportMode` method. And I think we can change it to `ColumnarSupportMode.UNSUPPORTED` to avoid unnecessary calls.

            Assignee:
            Ross Lawley
            Reporter:
            Zack Sun
            None
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: