-
Type: Bug
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: 5.0.0, 6.0.0, 7.0.0, 8.0.0
-
Component/s: None
-
Catalog and Routing
-
ALL
-
2
In order for DDLs to make forward progress on sharded clusters, we require the operation to be majority committed on all involved shards because we can't afford partial rollbacks in a distributed system.
When a user invokes a DDL operation that passes through a sharding DDL coordinator, the current behavior is the following:
- Client sends DDL command to router
- Router sends DDL command to the primary shard, overriding write concerns w:N with w:majority but preserving wtimeout if set
- Shard instantiates a DDL coordinator (independent thread)
- The command waits for the DDL to succeed (namely waiting for the coordinator thread to pass back the control)
- The command waits up to wtimeout for the write concern has been acknowledged
- Shard returns -> Router returns -> Client gets reply
On replica set deployments DDLs are performed locally, so when wtimeout expires the command promptly returns a write concern error to the user.
However, on sharded clusters DDL commands wait for the coordinator thread (step 4) and may hang as long as some of the shards involved in the DDL does not have a majority of the nodes available.
- related to
-
SERVER-97754 User-provided write concern is not honored for DDL operations on sharded clusters (always w:majority)
- Needs Scheduling