-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Sharding
-
Fully Compatible
-
ALL
-
Sharding 2021-05-03
-
1
ReshardingTxnCloner will return true from its until() lambda when the cancellation token isn't canceled. This means the remote donor shard returning a Cancellation or NotPrimary error causes the local recipient shard to halt cloning config.transactions records. The !cancelToken.isCanceled() condition should really be cancelToken.isCanceled() (see also ReshardingCollectionCloner for comparison).
if (status.isA<ErrorCategory::CancellationError>() || status.isA<ErrorCategory::NotPrimaryError>()) { // Cancellation and NotPrimary errors indicate the primary-only service Instance // will be shut down or is shutting down now - provided the cancelToken is also // canceled. Otherwise, the errors may have originated from a remote response rather // than the shard itself. // // Don't retry when primary-only service Instance is shutting down. return !cancelToken.isCanceled(); }
This pattern with AsyncTry is fairly common throughout the resharding code. We should consider making a common utility to express this logic. The withAutomaticRetry() function added as part of SERVER-51606 switched to a pattern that avoids checking the cancellation token in the until() lambda because the AsyncTry always checks the cancellation token on its own anyway.