-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Networking
-
Fully Compatible
-
ALL
-
v5.0
-
Service arch 2020-11-30, Service Arch 2021-05-17, Service Arch 2021-05-31, Service Arch 2021-06-14, Service Arch 2021-06-28
-
(copied to CRM)
-
37
-
5
We can throw NetworkInterfaceExceededTimeLimit in NetworkInterfaceTL::CommandStateBase::setTimer() if the connection future is not immediately ready This is a problem in both the exhaust and non-exhaust paths - if the connection future isn't immediately ready we call 'trySend()' in a getAsync continuation call that runs on the reactor (here in the non-exhaust case and here in the exhaust case).
I had previously tried to catch this error in the exhaust case in SERVER-48493, but after the BF reoccured realized I hadn't fully diagnosed the issue and this fix actually does nothing. I know we decided to throw here to save us from some unnecessary work, but I'm not sure if we thought about the fact that we could crash in some cases. Since this affects both the exhaust and non-exhaust paths, I'll leave the broader solution up to service arch to decide what to do here. As a part of this ticket, it would be nice to essentially revert my changes that I made as a part of SERVER-48493 because they don't actually do anything.
Acceptance criteria:
Write a repro.
Handle the NetworkInterfaceExceededTimeLimit error gracefully without letting the server crash.
- depends on
-
SERVER-57893 Make rsm_horizon_change.js resilient to network failures
- Closed
- is related to
-
SERVER-49434 Mark all 'getAsync' calls as noexcept in network_interface_tl.cpp
- Closed
-
SERVER-89093 Remove catchingInvoke and gSuppressNetworkInterfaceTransportLayerExceptions
- Closed
- related to
-
SERVER-91831 uassert in NetworkInterfaceTL::setTimer can crash the server
- Closed