Loading...

XML

Word

Printable

JSON

Type: New Feature
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: 100.8.1
Affects Version/s: None
Component/s: mongodump
Labels:
None

Days in Current Status:
1,641

When reading data that suffers from high network and I/O latency, a common technique to improve throughput is to parallelize the workload.

For mongodump, parallelizing a single collection scan could do the following:

$sample the collection and record a good number of _id values.
Sort the _id values and add pairs of [start _id, end _id] into a work queue.
Have some configurable number of worker threads pop work items, perform range queries and write the results.

There are trade-offs for this algorithm, sequential scans are being replaced with random access lookups, so I wouldn't recommend this being a default.

Additionally, the feature is less valuable when mongodump is already distributing work by scanning multiple collections in parallel. It's not clear to me if/how the existing --numParallelColletions should mix with this.

is related to

TOOLS-3047 Document-level parallelisation for mongorestore

Accepted

Assignee:: Unassigned
Reporter:: Daniel Gottlieb (Inactive)
Votes:: 1 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Nov 10 2020 09:40:00 PM UTC
Updated:: Jan 19 2024 04:43:00 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates