Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 7.3.0-rc0, 7.0.5, 6.0.13, 5.0.24, 4.4.28
Affects Version/s: 4.4.0, 5.0.0, 6.0.0, 7.0.0
Component/s: Sharding
Labels:
None

Assigned Teams:

Cluster Scalability
Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v7.0, v6.0, v5.0, v4.4
Sprint:
Cluster Scalability 2023-11-27, Cluster Scalability 2023-12-11, Cluster Scalability 2023-12-25
Linked BF Score:
155
Confidence Status:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

Consider a TransactionCoordinator that has sent the prepare command to the participants and then crashes. The new primary, on stepup, will resume the coordination. There are several points at which this can stall behind a read/write ticket acquisition. This is undesirable, both for performance and because it can cause deadlocks.

Ticket acquisitions occur at:
(1) When TransactionCoordinatorService::onStepUp calls replClientInfo.setLastOpToSystemLastOpTime, which takes the GlobalLock in MODE_IX.
(2) When TransactionCoordinatorService::onStepUp reads config.transaction_coordinators.
(3) When waiting for durable VectorClock. This sometimes results in a write (the first time after stepup, or upon topology changes).
(4) When (re-)persisting the participants list. Note that even though it had already been persisted, if the coordinator had not persisted the decision yet, on recovery we will persist again the participant list. As a separate improvement. we should also consider not doing this write again.

~~SERVER-60682~~ made persisting the decision skip ticket acquisition, but did not address these other situations that occur on recovery.

In addition to not skipping ticket acquisition, (1) and (3) do not skip FlowControl either.

related to

SERVER-60682 TransactionCoordinator may block acquiring WiredTiger write ticket to persist its decision, prolonging transactions being in the prepared state

Closed

Assignee:: Wenqin Ye

Reporter:: Jordi Serra Torrens

Participants:: Githook User, Jordi Serra Torrens, Josef Ahmad, Wenqin Ye

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Created:: Nov 07 2023 05:43:57 PM UTC

Updated:: Jan 01 2024 08:00:25 PM UTC

Resolved:: Dec 12 2023 06:57:37 PM UTC

Confidence Status Last Update:: 21/Nov/23 5:07 PM

GA Target Date:: None

Public Preview Target Date:: None

Private Preview Target Date:: None

Experiment Target Date:: None

Details

Description

Attachments

Issue Links

Activity

People

Dates