Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Done
Priority: Critical - P2
Fix Version/s: None
Affects Version/s: 1.0.0
Component/s: API, Configuration, Documentation
Labels:
- needs_info
Environment:
windows server 2008 R2 standard

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

I want to use Spark Connector for spark cluster and shard mongoDB.
According to the note of spark connector, it has a feature: Data locality awareness, described as The Spark connector is aware which MongoDB partitions are storing data.

Question 1: What is the correct behavior of this feature? In my test environment, there are two machines installed shard mongoDB and using the spark cluster in these two machines. When I create RDDs using JavaMongoRDD<Document> items = MongoSpark.load(sc, readConfig);
I found that the RDDs not always generated in the same worker getting the data from the mongo shard in same machine. It is random behavior.

Question 2: How to configure this feature? I have searched the solutions and test below configuration but not work:
1. readOverrides.put("readpreference.name", "nearest"); it seems for cluster disaster.
2. readOverrides.put("partitioner", "MongoShardedPartitioner"); not work

Assignee:: Unassigned
Reporter:: xiefeng
Reviewers:: None
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Aug 23 2016 03:43:39 AM UTC
Updated:: Sep 22 2021 06:51:40 PM UTC
Resolved:: Aug 23 2016 09:24:23 AM UTC

Details

Description

Attachments

Activity

People

Dates