-
Type: Task
-
Resolution: Won't Fix
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Atlas Streams
-
Sprint 46
A few stream processor in prod are hitting errors like:"No suitable servers found (`serverSelectionTryOnce` set): [socket timeout calling hello on 'mycluster-shard-00-01.zywgx.mesh.mongodb.net:30460'] [socket timeout calling hello on 'mycluster-shard-00-00.zywgx.mesh.mongodb.net:30460'] [socket timeout calling hello on 'mycluster-shard-00-02.zywgx.mesh.mongodb.net:30460']: generic server error"
- Investigate the root cause for this issue. Hopefully, it is a misconfiguration due to incorrect auth the user supplied, or due to the cluster no longer existing.
- Some starting points for investigation
- https://wiki.corp.mongodb.com/display/RI/Splunk#Splunk-Tracingarequestfromstreamstoanatlascluster(note:justreplacebaaswithstreams)
- https://cloud.mongodb.com/admin/nds/groups
- This log that Erik added: https://github.com/10gen/mongohouse/pull/9217/files#diff-58a98c05fdce39bf649f3b88379ab600f499ae61c3303673e66839bdc77b3355R329
- Add a "trouble shooting guide" for investigating this error
Stream Processor dashboards:
Overall Ops Dashboard (where I initially noticed these errors):