There are some implementations of the DDL coordinator (like movePrimary) that are designed to not always make forward progress on retriable errors. Such classes set the _completeOnError flag which will prevent retrying the operation if a retriable error is found.
The purpose of this task, is to ensure that if a retriable error occurs (such as a stepdown in the config server) in a DDLCoordinator implementation that has the _completeOnError flag set to true, the distributed locks are released. The following scenario is an example:
- A movePrimary command starts
- There is a stepdown on the config server when committing
This will leave the primary node of the primary shard with the distributed lock for the database acquired. This would only affect operations that try to grab the database distributed lock on the config server after the scenario has happened.
- is caused by
-
SERVER-55150 Add whitelist of errors that will not be retried on rename collection path
- Closed