Context
The issue that this occurred happens when the TransactionCoordinator is also a participant. The local transaction reaper gets triggered before the TransactionCoordinator sends the abortTransaction command to the local transaction (also due to a timeout). The coordinator sends the abort command to all of the participants, but since the coordinator is also a participant, it will utilize handleRequest to abort the local transaction.
The underlying function which handles the request has special logic in the event that the coordinator is also the participant, instead of going through the network, it will directly call handleRequest. This is the origin of that stack frame above.
That call to handleRequest will get stuck because the ServiceEntryPoint attempt to do a no-op write because the abortTransaction command failed with a NoSuchTransaction error.
Proposal
The fix required to make the test work as expected is for the transaction coordinator assert.soon accept the coordinator to be in any step equal to or past writingDecision. The new assert.soon function that checks for the server status of the transaction coordinator should look something like this:
let twoPhaseCommitCoordinatorServerStatus; assert.soon( () => { twoPhaseCommitCoordinatorServerStatus = txnCoordinator.getDB(dbName).serverStatus().twoPhaseCommitCoordinator; const deletingCoordinatorDoc = twoPhaseCommitCoordinatorServerStatus.currentInSteps.deletingCoordinatorDoc; const waitingForDecisionAcks = twoPhaseCommitCoordinatorServerStatus.currentInSteps.waitingForDecisionAcks; const writingDecision = twoPhaseCommitCoordinatorServerStatus.currentInSteps.writingDecision; return deletingCoordinatorDoc.toNumber() === 1 || waitingForDecisionAcks.toNumber() === 1 || writingDecision.toNumber() === 1; }, () => `Failed to find 1 total transactions in the deletingCoordinatorDoc state: ${ tojson(twoPhaseCommitCoordinatorServerStatus)}`);
- is caused by
-
SERVER-60685 TransactionCoordinator may interrupt locally executing update with non-Interruption error category, leading to server crash
- Closed