Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- big
- init-337-m3

Assigned Teams:

Atlas Streams
Backwards Compatibility:
Fully Compatible
Sprint:
Sprint 32, Sprint 33
Linked BF Score:
135
Confidence Status:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

https://mongodb.slack.com/archives/C04AH2TF7E1/p1695164357496619

Conversation above ^

–
aadesh 13 days ago

@ kenny.gorman re: throughput numbers for streams
__
aadesh 13 days ago

was chatting with Sandeep, so tomorrow im planning on getting a bunch of numbers together with various set ups. We have a few genny workloads setup for streams right now but those are using the in-memory source/sink operators so not super reflective of production setups. So plan is to run those same workloads against a kafka source in different regions with streams running in us-east-1 so that we have throughput numbers on a kafka source pipelines, and then send over a bunch of throughput numbers to you for each workload and source operator setuphows that generally sound? * in-memory source operator

same region kafka -> mstreams
different regions kafka -> mstreams

will run that set up for every workload we have in genny ^ along with the avg document size (in bytes) that we're using for those workloads
__
aadesh 13 days ago

@ kenny.gorman
__
aadesh 13 days ago

each workload will be a diff type of stream pipeline and document size
__
Sandeep Dhoot 13 days ago

@aadesh we will want to change pipelines later and get these numbers a few times. So please do try to make the whole process repeatable.
__
Joe Niemiec 13 days ago

I think it makes since it'll be good to have some baseline idea of the impact of a bandwidth delay product with cross region
__
Joe Niemiec 13 days ago

I would make sure you really document how your Kafka setup is as well, Kafka tends to rely heavily on Linux page cache so there could be a difference between a Kafka which has buffered properly versus one that isn't because it's cold (edited)
__
Joe Niemiec 13 days ago

We also have some customers where reading change streams could potentially be cross region or merging to a cluster cross region (edited)
__
kenny.gorman 13 days ago

Maybe I missed it but we need source and sink variations. Like Kafka to Kafka and Kafka to Mongo. To a lesser degree we need change stream source to Mongo.
__
kenny.gorman 13 days ago

Yeah exact Kafka config is important. Repeatable is critical. Maybe something anyone can run not just engineering (thinking field) but maybe I am being too optimistic
__
kenny.gorman 13 days ago

This is awesome guys. Can’t wait to see the results
__
Sandeep Dhoot 13 days ago

Btw sources/sinks will introduce variability in results for N different reasons (source is in different region, unique kind of Kafka deployment). It would require too much effort to try to cover all the different scenarios. I hope we can just test with only a couple of difference scenarios and use those as ballpark numbers. (edited)
__
kenny.gorman 12 days ago

A couple different scenarios is what I meant yes, not all
__
kenny.gorman 12 days ago

The main one is intra-region from kafka to mongodb, and intra region mongodb to mongodb. I am not sure (@ joe) if we have lots of Kafka to kafka use cases just yet.

Joe Niemiec 12 days ago

some rough telemetry based on data I have for combinations over 30 customers (a customer may do more then 1 pattern)CS 2 Kafka - 7
Kafka 2 Col - 12
Kafka 2 Kafka - 8
CS 2 Collection - 14
__
Joe Niemiec 12 days ago

so really Kafka to Collection and CS to Collection are the top dogs
__
aadesh 12 days ago

perf thats super helpful
__
aadesh 11 days ago

re: repeatability, might take a bit more time on getting to system where we can easily repeat different setups for different stream pipelines

need to make various changes to existing perf tooling infra (DSI specifically) to distinguish mongod vs mstreams, but looking into that more today so that we can get to a place where its easy to do all this

related to

SERVER-81719 Streams: Integrate mongostream into performance testing infrastructure

Closed

Assignee:: Aadesh Patel (Inactive)

Reporter:: Aadesh Patel (Inactive)

Participants:: Aadesh Patel, Githook User

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: Sep 20 2023 07:01:43 PM UTC

Updated:: Oct 19 2023 01:55:49 PM UTC

Resolved:: Oct 19 2023 01:55:40 PM UTC

Confidence Status Last Update:: 22/Sep/23 4:39 AM

Details

Description

Attachments

Issue Links

Activity

People

Dates