Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-97736

DDLs passing through DDL coordinators hang as long as majority write concern is not available

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 5.0.0, 6.0.0, 7.0.0, 8.0.0
    • Component/s: None
    • Catalog and Routing
    • ALL
    • 2

      In order for DDLs to make forward progress on sharded clusters, we require the operation to be majority committed on all involved shards because we can't afford partial rollbacks in a distributed system.

      When a user invokes a DDL operation that passes through a sharding DDL coordinator, the current behavior is the following:

      1. Client sends DDL command to router
      2. Router sends DDL command to the primary shard, overriding write concerns w:N with w:majority but preserving wtimeout if set
      3. Shard instantiates a DDL coordinator (independent thread)
      4. The command waits for the DDL to succeed (namely waiting for the coordinator thread to pass back the control)
      5. The command waits up to wtimeout for the write concern has been acknowledged
      6. Shard returns -> Router returns -> Client gets reply

      On replica set deployments DDLs are performed locally, so when wtimeout expires the command promptly returns a write concern error to the user.

      However, on sharded clusters DDL commands wait for the coordinator thread (step 4) and may hang as long as some of the shards involved in the DDL does not have a majority of the nodes available.

            Assignee:
            Unassigned Unassigned
            Reporter:
            pierlauro.sciarelli@mongodb.com Pierlauro Sciarelli
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: