Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 2.1.3, 2.2.4, 2.3.0
Affects Version/s: None
Component/s: Performance, Schema
Labels:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

When using the DataFrame API to load a MongoDB collection which contains a field with very dynamic keys, the SchemaInfer step will generate a very large schema which leads to long wait times or OutOfMemory errors.

My suggestion is to detect those fields and turn them into a MapType.
There would be two requirements for detecting a MapType:
1. Key and Value are always of the same or compatible type
2. Over n (probably configurable) keys in the field.

I will try to submit a pull request for this.

is depended on by

SPARK-194 Configuration Updates

Released

SPARK-181 Spark Connector 2.3.0

Closed

Assignee:: Ross Lawley
Reporter:: Jochen Niebuhr
Reviewers:: None
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Aug 04 2017 06:48:31 AM UTC
Updated:: Oct 28 2023 10:34:03 AM UTC
Resolved:: Jul 13 2018 11:37:18 AM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates