Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-88978

Resharding coordinator should ensure participants have seen kAbort before dropping temp collection metadata

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • Affects Version/s: 5.0.0, 6.0.0, 7.0.0, 7.2.0, 8.0.0-rc0, 7.3.0
    • Component/s: None
    • None
    • Cluster Scalability
    • Fully Compatible
    • ALL
    • v8.0, v7.3, v7.0, v6.0, v5.0
    • Cluster Scalability 2024-4-29, Cluster Scalability 2024-5-13, Cluster Scalability 2024-5-27
    • 105

      When the resharding coordinator aborts, it performs the following steps:
      1. Transition the state document to kAbort.
      2. Send the _shardsvrAbortReshardCollection to the participants
      3. Proceed with cleaning up the resharding temporary collection metadata.

      However, by the time (3) executes there's no guarantee that shards will have seen the transition to kAbort (1). This is because (2) only clears the filtering metadata on the primary nodes (and issues a best effort async sharding metadata refresh which in turn will also asynchronously flush the ShardServerCatalogCacheLoader). This can be problematic in case of failover to a new secondary that is not yet aware of kAbort.

      One solution could be to make (2) perform this sharding metadata refresh + durably (majority) flush of the shardServerCatalogCacheLoader.
      Another solution, which is more in line with other callers of _updateCoordinatorDocStateAndCatalogEntries, would be to call _tellAllDonorsToRefresh() (and _tellAllRecipientsToRefresh() too?) right after this line on the abort procedure.

            Assignee:
            abdul.qadeer@mongodb.com Abdul Qadeer
            Reporter:
            jordi.serra-torrens@mongodb.com Jordi Serra Torrens
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: