-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: 5.0.0, 6.0.0, 7.0.0, 7.2.0, 8.0.0
-
Component/s: None
-
Cluster Scalability
-
Fully Compatible
-
ALL
In the function ReshardingMetrics::getRecipientHighEstimateRemainingTimeMillis that is used to calculate an estimate of remaining time to apply resharding oplog entries, there is an issue related to the conversion of elapsed time from milliseconds to seconds, which can result in a loss of precision. Specifically, when the elapsed time is less than one second, using the getElapsed<Seconds> template parameter here triggers a duration_cast<Seconds> here. This casting operation truncates any fractional part, meaning that an elapsed time of, for example, 800 milliseconds (0.8 seconds) would be reported as 0 seconds. This can lead to incorrect calculation in estimateRemainingTime as the value would be 0.
Following is a reproducer for the same:
TEST_F(ReshardingMetricsTest, RecipientReportsRemainingTimeLowElapsed) { auto metrics = createInstanceMetrics(getClockSource(), UUID::gen(), Role::kRecipient); const auto& clock = getClockSource(); constexpr auto timeSpentCloning = Seconds(20); constexpr auto timeSpentApplying = Milliseconds(50); metrics->onOplogEntriesFetched(500000); metrics->setStartFor(TimedPhase::kCloning, clock->now()); clock->advance(timeSpentCloning); metrics->setEndFor(TimedPhase::kCloning, clock->now()); metrics->setStartFor(TimedPhase::kApplying, clock->now()); clock->advance(timeSpentApplying); metrics->onOplogEntriesApplied(300000); auto report = metrics->getHighEstimateRemainingTimeMillis(); ASSERT_EQ(report, Milliseconds{0}); }
- is related to
-
SERVER-84769 Resharding remainingOpTime algorithm doesn't work with low elapsedTime
- Closed