Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-89529

Retryable writes during resharding may execute more than once if chunk migration follows the reshard operation

    • Cluster Scalability
    • Fully Compatible
    • ALL
    • v8.0, v7.3, v7.0, v6.0, v5.0

      Resharding preserves the full retryability history for any retryable writes which occur during the resharding operation. If a chunk migration follows the resharding, session migration should transfer the relevant write history over to the recipient of the chunk. The way chunk migration checks for whether an oplog is relevant is by filtering on the namespace being migrated

      The problem is that when resharding recipients update their config.transactions table (based on the retryable writes/transactions performed on the donor shard), it creates a noop oplog entry with the namespace set to empty. If the resharding recipient then becomes the donor in the following chunk migration, due to the empty namespace, it will incorrectly conclude that this oplog entry isn't relevant to the chunk actively being migrated. As a result, the noop oplog entry for the already executed retryable write never gets migrated and the retryable write could be executed again after the chunk migration commits. 

       

      Adding Max's repro for this issue:

      1. Start a resharding operation
      2. Run a retryable $inc update during the resharding operation
      3. Resharding operation completes
      4. Run chunk migration
      5. Retry retryable write from (2) and verify no new oplog entry was generated

            Assignee:
            ben.gawel@mongodb.com Ben Gawel
            Reporter:
            kruti.shah@mongodb.com Kruti Shah
            Votes:
            0 Vote for this issue
            Watchers:
            15 Start watching this issue

              Created:
              Updated:
              Resolved: