Uploaded image for project: 'Spark Connector'
  1. Spark Connector
  2. SPARK-69

Make querying and declaring unsupported bson types easier.

    • Type: Icon: Improvement Improvement
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 1.1.0
    • Affects Version/s: 1.0.0
    • Component/s: API
    • None
    • Environment:
      Databricks notebook

      For users not using case classes or java beans declaring the schema for unsupported bson types and querying them is painful.

      Declaring schema for unsupported bson types requires indepth knowledge of the best StructType add helpers so users can easily declare the schema. Querying those unsupported bson types also requires the same indepth knowledge of the StructType add helpers to make this process easier.

      Was:

      Using .filter on top of spark connector is dropping rows from dataframe

      I'm Pulling data from Mongo Altas into a spark dataframe using Spark-connector 1.0.0 with Python. I'm specifying my schema explicitly due to combined datatypes in some fields.

      When I specify a filter (.filter, .where, sql query behave identically here) before evaluating, AND that filter is for a string comparison, AND I specify equivalency (==), I always get zero results. Some troubleshooting:
      --Evaluating without filters returns all data as it should.
      --Evaluating the same field with a different boolean expression (!=) works as it should.
      --Evaluating using the same expression on local data works fine. This holds for local dummy data, as well as cached results. My current workaround is to cache all of the data prior to applying filters.

            Assignee:
            ross@mongodb.com Ross Lawley
            Reporter:
            brencklebox Mark Brenckle
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: