-
Type: Task
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Component/s: CSOT
-
None
-
Not Needed
-
In the PR review for the timeout spec matt.dale provided a suggestion which was never resolved. To quote:
Using the 90th percentile RTT latency will result in some operations that are likely to complete being cancelled instead.
Let's consider a Find operation that completes quickly on the server (i.e. <1ms) running on an Atlas cluster, so almost all of the latency is from the network round trip. There are 3 buckets of timing conditions the driver will encounter:
- The client-side deadline is greater than (now + max observed RTT); the operation will almost certainly complete before the deadline.
- The client-side deadline is between [(now + min observed RTT), (now + max observed RTT)]; the operation may complete or may fail due to timeout.
- The client-side deadline is less than (now + min observed RTT); the operation will almost certainly fail due to timeout.
The operations we're interested in are in bucket 2. By assuming the network round trip will take the 90th percentile observed RTT, we may cancel operations that have a nearly 90% chance of completing before the deadline. Cancelling operations is dangerous because we're actually preventing the driver from doing work. We should instead bias toward cancelling as few operations that have a reasonable chance of completing as possible, in exchange for also letting more operations time out.
I propose that we change the cancellation threshold to the 5-minute minimum RTT (i.e. minimum RTT observed in the last 5 minutes) instead of the 90th percentile. While the 10th or 25th percentile more closely match the "reasonable chance of succeeding" threshold, the added complexity of using the t-digest algorithm doesn't seem to justify the small optimization.
We should reconsider the 90th RTT heuristic used for preventing sending an operation and setting maxTimeMS.
- is related to
-
NODE-3078 Client Side Operations Timeout
- Development Complete
- split to
-
NODE-5825 Add minRoundTripTime field and calculation to Monitor
- Closed
-
GODRIVER-2762 Use minimum RTT for CSOT maxTimeMS calculation instead of 90th percentile
- Closed
-
PYTHON-3616 Use minimum RTT for CSOT maxTimeMS calculation instead of 90th percentile
- Closed