Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Catalog and Routing
Story Points:
2

Once a node steps up, it will try to recover the shardVersion as part of the resume migration hook.

Until the resume migration is over, the shardVersion will be marked as UNKNOWN which won't allow any read or write operation to be served.

As part of resume, the migration will be completed. The completion will depend on whether the collection was either committed or aborted:

In case is aborted the donor will

Exit the critical section on the recipient
Schedule a range deletion for possible orphans on the recipient
Delete the range deletion task locally

In case is committed the donor will:

Exit the critical section on the recipient
Schedule a range deletion task locally for possible orphans on the donor
Delete the range deletion task on the recipient

Ideally, the entire completion could be done asynchronously which would re-enable read and writes faster on the donor.

Note this ticket is just a suggestion as part of the conclusion taken on BF-34016 investigation, where the recovery on the donor caused a transaction on the recipient to block. The required time and cost of implementation should be evaluated carefully.

In general, we should also evaluate whether the benefit of such implementation would outweigh the costs.

Assignee:: Unassigned

Reporter:: Enrico Golfieri

Participants:: Enrico Golfieri

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: Jul 17 2024 12:52:02 PM UTC

Updated:: Jul 18 2024 10:32:53 AM UTC

Details

Description

Attachments

Activity

People

Dates