Filing this bug after a recent production issue. As per design, Shard merge is not resilience to recipient failovers or restarts. When a recipient failover/restart occur, the shardMergeRecipientService can return an error code, either ErrorCategory::NotPrimaryError or ErrorCategory::isShutdownError. The TenantMigrationDonorService assumes that if such an error is thrown, it originates always from the donor side rather than the recipient side. Consequently, it does not abort the migration and directly waits for forgetMigration.
TenantMigrationDonorService might start waiting for forget migration without aborting migration upon recipient errors.
- Votes:
-
0 Vote for this issue
- Watchers:
-
2 Start watching this issue
- Created:
- Updated:
- Resolved: