Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Documentation
Labels:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

I have a sharded cluster consisting of 6 Nodes:

3 Replica Sets (rs0, rs1, rs2)
3 Config Servers
3 Mongos
Each Replica Set consists of two Shardsvr (one primary, one secondary) and an Arbiter

When I use the MongoSpark Connector to connect to my cluster, I use these settings to connect to the Mongos.

  val conf = new SparkConf()
      .setAppName("Cluster Application")
	  .set("spark.mongodb.input.uri",
        "mongodb://hadoopb24:27017,hadoopb30:27017,hadoopb36:27017/test.data")
      .set("spark.mongodb.output.uri", "mongodb://hadoopb24:27017/test.myCollection")

I launch 6 executors to parallelize the operations. However just two executors doing all the work:

I also tried changing the readPreference.name value setting to "nearest", which should read from both primaries and secondaries. But it barely helps:

Each Replica Set holds some of the data that gets returned. The question: Do I have to connect to each of the Shardsvr instances rather than the Mongos? I expected that the mongos would lead to the best results. How can I achieve better parallelism?

The documentation could take a few words on this matter. I think my problem is related to this.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

spark4.PNG
19 kB
Nov 16 2016 12:47:38 PM UTC
spark5.PNG
21 kB
Nov 16 2016 01:04:51 PM UTC

Assignee:: Unassigned

Reporter:: F H

Reviewers:: None

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: Nov 16 2016 01:08:17 PM UTC

Updated:: Sep 22 2021 06:51:42 PM UTC

Resolved:: Nov 17 2016 12:25:55 PM UTC

GA Target Date:: None

Public Preview Target Date:: None

Private Preview Target Date:: None

Experiment Target Date:: None

Details

Description

Attachments

Attachments

Activity

People

Dates