Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-40713

Enable fsm workloads that use moveChunk in sharded stepdown suites

    • Type: Icon: Task Task
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Sharding
    • Sharding
    • Sharding 2019-06-17, Sharding 2019-07-15, Sharding 2019-08-26

      There are a few concurrency workloads that explicitly use moveChunk, which are currently disallowed in the concurrency stepdown suites because moveChunk is considered a non-retryable command by the network error retry override. Conceptually, moving a chunk is retryable, but it's likely it was disallowed because it can return non-retryable by default error codes if interrupted (e.g. OperationFailed if persisting critical section signal fails).

      To get more coverage of stepdowns concurrent with moveChunks, it should be possible to add special logic to the network override to handle the particular errors returned by moveChunk instead, similar to the workarounds for other operations that return inconsistent codes. This would be especially valuable for the workloads that move chunks while running transactions, like random_moveChunk_broadcast_update_transaction.jsrandom_moveChunk_broadcast_update_transaction.js, and agg_with_chunk_migrations.js when running in the concurrency_sharded_multi_stmt_txn_with_stepdowns suite.

            Assignee:
            backlog-server-sharding [DO NOT USE] Backlog - Sharding Team
            Reporter:
            jack.mulrow@mongodb.com Jack Mulrow
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: