-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Sharding
-
Minor Change
-
ALL
-
v6.1
-
Sharding 2022-08-22, Sharding 2022-09-05
-
3
In response to a _shardsvrReshardingOperationTime command (used for querying the estimated remaining time in a resharding operation) from the resharding coordinator, a recipient shard executes this code, which calls ReshardingMetrics::getRecipientHighEstimateRemainingTimeMillis to compute the estimate of the remaining time. That function may return 0 incorrectly if the shard has just had a failover, and not yet restored all of the metrics. That can happen because the metrics are only partly restored here and partly restored here.
As a result, if a _shardsvrReshardingOperationTime command enters the system at the wrong time, it may observe only partly restored metrics, and the coordinator would be misled into believing that it can begin the critical section.
This is related to SERVER-67653, but is not the same because in that ticket the coordinator incorrectly treats an omitted remainingMillis field as 0 remainingMillis. In this ticket, the recipient incorrectly returns 0 remainingMillis.
- is depended on by
-
COMPASS-6094 Investigate changes in SERVER-68783: Recipient shard may incorrectly return 0 milliseconds remaining in resharding
- Closed
- related to
-
SERVER-67653 Resharding coordinator can incorrectly conclude that it can start the critical section although on one recipient the oplog applier hasn't caught up with the oplog fetcher
- Closed
-
SERVER-70079 remove optional_util::setOrAdd
- Closed