-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Sharding
-
Fully Compatible
-
ALL
-
v5.0
-
Sharding 2021-07-26, Sharding 2021-08-09
-
1
In resharding, shards call into the config server in order to update the coordinator document (donor, recipient). NetworkInterfaceExceededTimeLimit and MaxTimeMSExpired errors are not considered retriable, but are definitely reachable – these commands have a timeout of 30 seconds, and one of the listed errors will be thrown if the timeout is reached. These errors will escape any command retrying and resharding-specific transient error retrying, and will ultimately cause an fassert on whatever node is running resharding.
The solution here is to figure out the best place to swallow and retry these errors.
- related to
-
SERVER-79771 Make Resharding Operation Resilient to NetworkInterfaceExceededTimeLimit
- Closed