-
Type: Task
-
Resolution: Won't Do
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Serverless
-
0
Catch shard merge errors that arise outside of tenant_migration_recipient_service.cpp. E.g. failed to copy file, failed to import in tenant_migration_recipient_op_observer.cpp and tenant_file_importer_service.cpp. These can occur on the recipient primary or secondaries. Inform the recipient primary so it aborts the merge, probably via voteCommitMigrationProgress (which I'm renaming to recipientVoteImportedFiles).
When the primary aborts the merge, all file importers must stop soon. Probably, the file importer that encountered the error should stop immediately (rather than waiting for the primary to abort the migration).
Without this work, such failures will cause the migration to time out eventually instead of aborting promptly.
There are some JS tests (see jstests/replsets/tenant_migration_donor_interrupt_on_stepdown_and_shutdown.js) that have been temporarily disabled until R secondaries have the ability to inform R primaries of errors (for example, in this case, an error during cloning). In the aforementioned test, the donor is shut down and cloning to R secondaries partially fails. Since the R secondaries cannot inform the R primary of a partial failure, the R primary moves on and transitions state to learned filenames, so R secondaries start importing files despite cloning not actually have been completed.
- depends on
-
SERVER-61144 Finish importing donated collections on secondaries
- Closed
- is duplicated by
-
SERVER-63120 Handle recipient secondary failures while performing file copy based cloning procedure.
- Closed
-
SERVER-63653 Abort merge on error in TenantFileImporterService
- Closed
- related to
-
SERVER-74619 Complete TODO listed in SERVER-63390
- Closed