Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 8.1.0-rc0, 8.0.0-rc8, 7.3.4, 7.0.13
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Catalog and Routing
Backwards Compatibility:
Fully Compatible
Backport Requested:

v8.0, v7.3, v7.0, v6.0, v5.0
Sprint:
CAR Team 2024-06-10
Linked BF Score:
200
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

When we create a DDLCoordinator a lambda is attached to the getConstructionCompletionFuture here.

Since there is no way to chain the lambda onto the executor that runs the promise of the getConstructionCompletionFuture, so the getInstanceCleanupExecutor() is used as the executor.

However with this the creation of a DDLCoordinator can survive a stepDown - stepUp phase since the cleanupExector never shut down.

In a case where the lambda runs after the _status is set to Recovering in the ShardingDDLCoordinatorService::_onServiceInitialization() but before we load the coordinators to recover in the (async task that is created by) ShardingDDLCoordinatorService::_rebuildService then the _numCoordinatorsToWait is 0 as set in the ShardingDDLCoordinatorService::_onServiceTermination() and this invariant fails.

The fix idea is to use the same executor what is provided by the repl::PrimaryOnlyService as that executor is interrupted on every onSetpDown and joined and recreated in every onStepUp.

Side note: the same issue happens in the completion future as well here
Beside fixing the executor here, in the ShardingDDLCoordinatorService::_onServiceTermination() we have to clear the _numActiveCoordinatorsPerType and call _recoveredOrCoordinatorCompletedCV.notify_all(); as well.

is caused by

SERVER-90330 Creation of DDL coordinator hang indefinetly if executed on secondary node

Closed

Assignee:: Wolfee Farkas
Reporter:: Wolfee Farkas
Participants:: Githook User, Wolfee Farkas
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Jun 06 2024 01:28:02 PM UTC
Updated:: Jun 27 2024 07:49:35 AM UTC
Resolved:: Jun 07 2024 12:27:46 PM UTC
Confidence Status Last Update:: 06/Jun/24 1:28 PM

Details

Description

Attachments

Issue Links

Activity

People

Dates