Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 8.1.0-rc0, 8.0.0-rc7, 7.3.4, 7.0.13
Affects Version/s: 5.0.0, 6.0.0, 7.0.0, 8.0.0-rc0, 7.3.0
Component/s: None
Labels:
None

Assigned Teams:

Catalog and Routing
Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v8.0, v7.3, v7.0, v6.0, v5.0
Sprint:
CAR Team 2024-05-13, CAR Team 2024-05-27
Linked BF Score:
200
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

Creation of DDL coordinator is done through the ShardingDDLCoordinator::getOrCreate function.

This function internally calls ShardingDDLCoordinator::waitForRecoveryCompletion to wait for the service to complete recovery and reach a stable state before to create new coordinator. This is to avoid acquisition of DDL lock (perform by each DDL coordinator instance) before all the previously spawned coordinator have been recovered and acquired their respective DDL locks.

The waitForRecoveryCompletion funciton waits until the service reach the _state == kRecovered.
If this function is called while the node is secondary the state will be kPaused and it will not become kRecovered until the node get elected primary again.

Looking closely at this code, I spot another issue. Since we are not holding the _state lock, there is no guarantee that in between:

Call to waitForRecoveryCompletion()
And the actual creation of the coordinator

The _state of the service will change back to kRecovering. In fact it could be that after 1. the node steps down (kRecovered -> kPaused) and then step up again (kPaused -> kRecovering) before executing 2.
This second issue is highly unprobable because we would need to execute a full cycle of stepdown and stepup in few milliseconds.

causes

SERVER-91247 Ensure that DDLCoordinator creation does not survive node stepDown-stepUp

Closed

is duplicated by

SERVER-90628 _shardsvrReshardCollection command doesn't always get interrupted on stepdown

Closed

Assignee:: Tommaso Tocci
Reporter:: Tommaso Tocci
Participants:: Githook User, Tommaso Tocci
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: May 08 2024 03:43:10 PM UTC
Updated:: Jul 12 2024 04:51:19 PM UTC
Resolved:: May 24 2024 06:14:12 AM UTC
Confidence Status Last Update:: 08/May/24 3:51 PM

Details

Description

Attachments

Issue Links

Activity

People

Dates