-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
AWS Lambda (and likely other FaaS services) will pause the app process when it's idle and resume it later on demand (when a new request comes in). This pause/resume behavior causes SDAM heartbeats to timeout which then clears the pool and marks the server Unknown. This causes connection churn and increased latency since the servers need to be rediscovers and all pooled connections need to be recreated.
This behavior can be simulated locally using SIGSTOP + SIGCONT:
2022-03-25 14:40:38,915 INFO event_loggers Heartbeat sent to server ('localhost', 27018) 2022-03-25 14:40:38,916 INFO event_loggers Heartbeat sent to server ('localhost', 27019) [1] + 93208 suspended (signal) python repro-DRIVERS-2246.py $ sleep 60 $ kill -SIGCONT 93208 2022-03-25 14:42:16,835 WARNING event_loggers Heartbeat to server ('localhost', 27017) failed with error localhost:27017: timed out 2022-03-25 14:42:16,835 WARNING event_loggers Heartbeat to server ('localhost', 27018) failed with error localhost:27018: timed out 2022-03-25 14:42:16,836 INFO event_loggers Heartbeat sent to server ('localhost', 27017) 2022-03-25 14:42:16,836 INFO event_loggers Heartbeat sent to server ('localhost', 27018) 2022-03-25 14:42:16,836 WARNING event_loggers Heartbeat to server ('localhost', 27019) failed with error localhost:27019: timed out 2022-03-25 14:42:16,837 INFO event_loggers Heartbeat sent to server ('localhost', 27019)
We can mitigate this issue by performing one non-blocking check to see if the socket is readable after a timeout:
2022-03-29 15:24:52,344 INFO event_loggers Heartbeat sent to server ('localhost', 27017) [1] + 30988 suspended (signal) python3.10 repro-DRIVERS-2246.py $ sleep 30 && kill -SIGCONT 30988 2022-03-29 15:25:37,944 INFO event_loggers Heartbeat to server ('localhost', 27017) succeeded with reply {'topologyVersion': ...
- causes
-
PYTHON-3191 Test Failure - Versioned API requireApiVersion1
- Closed
- is related to
-
PYTHON-2448 TLS handshake fails sometimes when running on AWS Lambda
- Closed
- related to
-
DRIVERS-2246 Heartbeat build up with streaming protocol when driver process is stopped (FAAS)
- Closed
-
DRIVERS-1598 Solve for serverless/lambda connection pool issues
- Development Complete