Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Sharding
Labels:
- ShardedTxn:FutureOptimizations

Assigned Teams:

Cluster Scalability
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

As of 4.2 our 2 PC implementation requires 4x majority commits. This is usually the dominating portion of commit latency.

(Below persist = w:majority commit)

1. transaction coordinator shard persists information about what other participants are part of this transaction.
2. all participants persist the prepare phase of their part of the transaction
3. transaction coordinator persists the commit or abort decision
4. all participants persist commit or abort as directed by the coordinator

Step 1 can be avoided by folding it into step 2:

As part of the prepare message, coordinator embeds the list of participants and the identity of the coordinator. (The coordinator is one of the participants.) Participants persist this information together with persisting the prepare state of the transaction itself.

Recovery in case of failure:

2a) Remember that the original coordinator is itself a replica set / shard. Assuming that the original coordinator had succeeded in persisting its prepare message, and succeeds in electing a new primary and connecting to the rest of the cluster, then the new primary can simply resume the role of the coordinator and resend the prepare message to all participants. (In this scenario the coordinator can continue acting as the coordinator until the end of the 2PC protocol.)

2b) If the original coordinator fails before it had succeeded in persisting its own prepare message, it will have lost state that is necessary to continue as coordinator. (In fact, it doesn't even know it is a participant in this 2PC transaction!) Since at least 1 participant has failed the prepare phase and lost transaction state, the only possible outcome is that the transaction must be aborted. The other participants can learn about this by: after a timeout, poll (with readConcern=majority) all other participants for their state and if at least one other participant has lost the transaction, they must abort too. (It's also possible that a participant enters this state and learns that at least one participant has committed the transaction. In this case it can also commit its transaction.)

2c) If none of the participants have received or persisted the prepare message, then the transaction will eventually time out on each of them.

In step 3 the coordinator will first persist its abort or commit decision for itself. (Same as current behavior.) As per ~~SERVER-37364~~ it can now return the result of the transaction to the client.

In step 4 the coordinator sends abort/commit to all participants. In case of any failure, the coordinator can just keep resending the message until all participants have acknowledged it.

is related to

SERVER-37364 Coordinator should return the decision to the client as soon as the decision is durable

Closed

related to

SERVER-63971 Change server parameter to default to read-your-writes behavior after 2PC transaction

Closed

Assignee:: [DO NOT USE] Backlog - Cluster Scalability
Reporter:: Henrik Ingo (Inactive)
Participants:: [DO NOT USE] Backlog - Cluster Scalability, Andy Schwerin, Henrik Ingo, Mira Carey
Votes:: 0 Vote for this issue
Watchers:: 21 Start watching this issue

Created:: Mar 26 2020 01:21:14 PM UTC
Updated:: Dec 12 2023 03:49:09 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates