In com.mongodb.kafka.connect.sink.StartedMongoSinkTask#put a collection of records is grouped into batches of writes by namespace (i.e. mongoDB database and collection name). However, this list of distinct batches are then written to MongoDB in serial.
This means that you will get a large drop in performance if
- your sink connector consumes from multiple topics
or - you add transforms that split data from one topic into multiple collections
My team first noticed this issue during a data rate spike that caused the connector to lag behind by over an hour.
We should be able to do these bulk writes in parallel with a thread pool (with a configurable pool size) . Since each batch write is to a separate collection, ordering will not be impacted.