-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Internal Code
-
None
-
Fully Compatible
-
ALL
-
v4.4
-
-
Service Arch 2022-07-25, Service Arch 2022-08-08, Service Arch 2022-08-22, Service Arch 2022-09-05
-
0
A hedged operation that is failed due to a NetworkInterfaceExceededTimeLimit might cause the original operation to fail. Consider the following as an example (reproducible on v4.4):
- Mongos attempts to hedge a read operation.
- The hedged operation, running on a shard server, needs to query the config server (e.g., as part of waitForReadConcern).
- The config server is temporarily unavailable (e.g., a step-down is in progress), thus it cannot accept new connections.
- Querying the config-server times out for the hedged operation (i.e., NetworkInterfaceExceededTimeLimit).
- The hedged operation completes and returns the time-out error to the mongos server.
- Since the error is not MaxTimeMSExceeded, mongos kills the outstanding operation and returns the non-okay status to the caller (see here).
- The operation fails, while it would have (eventually) succeeded without hedging.
This ticket, or its sub-tasks, should:
- Check if this issue also applies to newer branches (post v4.4).
- Clarify the semantics for failing hedged operations (e.g., what errors may be ignored on hedged operations).
- Fix the implementation to honor the semantics.
- is related to
-
SERVER-68704 Clarify the semantics of failing hedged operations
- Closed
- related to
-
SERVER-69121 Update FCV version for `hedged_reads.js`
- Closed
-
SERVER-69402 Update FCV version for ttl_index_options.js
- Closed