-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Catalog and Routing
-
Fully Compatible
-
ALL
-
v8.0
-
CAR Team 2024-09-02
-
0
SERVER-21185 made the shard primary responsible for updating it's corresponding connection string in config.shards in the CSRS.
This job is started both during step-up and reconfig. The task's code doesn't use a single ReplSetConfig snapshot, and instead fetches the connection string first, and the config version later separately. The config version is used to prevent overwritting newer concurrent updates. However, due to the non-snapshotted nature of the code, it is possible for an update job to read the connection string, and by the time it fetches the config version, it is a newer version. This results in writing an old connection string with the new config version.
Attached SERVER-93707.diff :
- Node steps up, run scheduleReplicaSetUpdateOnConfigServerIfNeeded
- Task launched with StepUp starts
- Read connection string
- Pause after reading connection string.
- Reconfig, remove secondary node, yet another scheduleReplicaSetUpdateOnConfigServerIfNeeded call.
- Task launched with reconfig starts
- Pause before update.
- Task launched with StepUp
- Resumes execution.
- Reads config version.
- Updates config.shards with ConnString before node removal, but with config version for node removal.
- Task launched with reconfig
- Try to update with ConnString with removed node, and current config version.
- Update does nothing because current config version is already in config.shards
- is caused by
-
SERVER-21185 Make shard primary responsible for updating config server's knowledge of shard replica set members
- Closed