-
Type: Bug
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Atlas Streams
-
Sprint 60, Sprint 62
There have been a few prod examples of poor/unhelpful error messages connecting to Kafka.
We should investigate why, we have code to return more detailed error information: https://github.com/10gen/mongo/blob/master/src/mongo/db/modules/enterprise/src/streams/exec/kafka_event_callback.cpp#L48
There is likely an intermittent issue we need to fix in the "append recent errors" logic: https://github.com/10gen/mongo/blob/master/src/mongo/db/modules/enterprise/src/streams/exec/kafka_partition_consumer.cpp#L575
- We should use splunk to see if librdkafka is printing any useful error informations that should have been included in the "append recent errors" logic: https://github.com/10gen/mongo/blob/master/src/mongo/db/modules/enterprise/src/streams/exec/kafka_event_callback.cpp#L60.
- If so, why are those error messages not picked up in the "appendRecentErrorsToStatus" logic? Do we need to wait for a ~second for those error messages to come in?
Here is a recent repro: https://mongodb.slack.com/archives/C07HH6A8M55/p1730824228637259
Note this can also repro for VPC peering errors. Those will have better error messages after https://github.com/10gen/mongo/pull/26113 .
- related to
-
SERVER-92676 duplicate
- Closed
- mentioned in
-
Page Loading...