Loading...

XML

Word

Printable

JSON

Type: New Feature
Resolution: Unresolved
Priority: Unknown
Fix Version/s: None
Affects Version/s: None
Component/s: Sink
Labels:
None

Quarter:
- FY24Q3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

In com.mongodb.kafka.connect.sink.StartedMongoSinkTask#put a collection of records is grouped into batches of writes by namespace (i.e. mongoDB database and collection name). However, this list of distinct batches are then written to MongoDB in serial.

This means that you will get a large drop in performance if

your sink connector consumes from multiple topics
or
you add transforms that split data from one topic into multiple collections

My team first noticed this issue during a data rate spike that caused the connector to lag behind by over an hour.

We should be able to do these bulk writes in parallel with a thread pool (with a configurable pool size) . Since each batch write is to a separate collection, ordering will not be impacted.

Assignee:: Unassigned

Reporter:: Martin Andersson

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: Apr 27 2023 12:26:32 PM UTC

Updated:: May 06 2024 05:43:41 PM UTC

Details

Description

Attachments

Activity

People

Dates