-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
Fully Compatible
-
ALL
-
v6.1, v6.0, v5.0
-
3
The resharding coordinator queries all recipients for an estimation of the remaining time for the active resharding operation of participant shards in CoordinatorCommitMonitor::queryRemainingOperationTimeForRecipients using command _shardsvrReshardingOperationTime.
The recipient shards handle that command here. The function ReshardingMetrics::getOperationRemainingTime could possibly return boost::none. In that case, the "recipientMillis" field of the recipient's response to the coordinator will be omitted.
If all participants were to omit this field, then these if statements won't be entered and when the participant reads the max remaining time (here) remainingTimes.max would still be 0 and remainingTimes.min would still be Milliseconds::max (as they were initialized). In particular, the invariant mentioned here would fail.
The effect of this, is that a recipient returning an empty "remainingMillis" field is equivalent to it returning "remainingMillis: 0." This is a bug in at least one case: where _shardsvrReshardingOperationTime is run against a recipient shard before the recipient shard has restored its metrics (during a step up).
As a result, the coordinator would, believing that recipientMillis was under the threshold for all recipients, prematurely begin the critical section, and the resharding operation would fail with ReshardingCriticalSectionTimeout if the recipient above doesn't manage to enter the "strict-consistency" state within the timeout.
- is related to
-
SERVER-67650 Resharding recipient can return remainingOperationTimeEstimatedSecs=0 when the oplog applier hasn't caught up with the oplog fetcher
- Closed
-
SERVER-68783 Recipient shard may incorrectly return 0 milliseconds remaining in resharding
- Closed