Pasting Max's findings:
The problematic area is in https://github.com/mongodb/mongo/blob/r5.0.19/src/mongo/db/s/resharding/resharding_oplog_fetcher.cpp#L202-L203 where likely at the time of writing the code it was assumed because the function returns a StatusWith<> result it wouldn't be throwing an exception yet it seems like the function can also throw an exception. And so the exception causes the function to propagate an error rather than swallowing the error and retrying by doing the return true.
The ReshardingRecipientService should retry on transient NetworkTimeoutError category errors too in any retry loop. Since the change will be done in resharding_future_util.h, this improvement should affect all code using resharding::withAutomaticRetry
- is related to
-
SERVER-58389 Capture NetworkInterfaceExceededTimeLimit and MaxTimeMSExpired errors in resharding participants
- Closed
-
SERVER-72055 NetworkInterfaceTL should by default return a retryable error when it times out waiting to acquire a connection
- Closed
- related to
-
SERVER-80020 The exhaustiveFindOnConfig() method should retry on NetworkInterfaceExceededTimeLimit errors
- Backlog