Uploaded image for project: 'Spark Connector'
  1. Spark Connector
  2. SPARK-71

Support for Spark's MapType() for variable data

    • Type: Icon: Improvement Improvement
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 1.1.0
    • Affects Version/s: 1.0.0
    • Component/s: Schema
    • Environment:
      Pyspark sql 1.6.2 on databricks

      Some our data in mongo is of a "map" type. It's represented in mongo as an "Object" type, with a number of possible fields within it, of a defined type.

      i.e (...{"arbitrary_key":

      {sub_object}

      ...}

      we can read in the variable-field schema using Sparks MapType(), which allows us to specify the type of the key and value without requiring hardcoding of the names or the number of fields in the map. This works fine with the Mongo-spark connector when specifying the schema for reading.

      The issue comes when writing back out using the same schema. Python dictionaries can be passed into schemas as Map objects, to build dataframes that have these MapType() objects within them. Writing with the connector (using the same schema as for reading) produces the following error type:

      Cannot cast Map (example) into a BsonValue. MapType (schema) has no matching BsonValue.

      Is it possible to add support for writing from MapType objects into Mongo using the connector? It seems like they would need to be converted by the connector from dictionary-like objects into Bson objects in order to be written.

            Assignee:
            ross@mongodb.com Ross Lawley
            Reporter:
            brencklebox Mark Brenckle
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: