-
Type: Bug
-
Resolution: Fixed
-
Priority: Minor - P4
-
Affects Version/s: 4.1.5
-
Component/s: Replication
-
Fully Compatible
-
ALL
-
Repl 2019-04-08
-
0
Based on my conversation with tess.avitabile, I think the following bug exists in transaction metrics active and inactive counts (the bug may affect other metrics as well, I'm not sure):
—
If a transaction is aborted, whether the number of active or inactive transactions is decremented depends on whether the TxnResources were stashed at the time of the abort:
- if they were stashed, the inactive count is decremented
- if they were not stashed, the active count is decremented.
This usually works because in a transaction request's flow:
- First, TransactionParticipant::beginOrContinue calls TransactionMetricsObserver::onStart, which increments the number of currently inactive transactions
- Later, TransactionParticipant::unstashTransactionResources calls TransactionMetricsObserver::onUnstash (both if the TxnResources already exist, or if they were just created), which increments the number of currently active transactions and decrements the number of currently inactive transactions.
However, TransactionParticipant::abortArbitraryTransaction can be called outside of a checked out Session, and so the following sequence can happen, which causes the metrics to be incorrect for a short period:
// inactive: 0 // active: 0 // Starts *new* transaction; increments inactive count Thread 1: TransactionParticipant::beginOrContinue // inactive: 1 // active: 0 // TxnResources have not been created, so _txnResourceStash is boost::none; interprets this as meaning the transaction is active and decrements active count Thread 2: TransactionParticipant::abortArbitraryTransaction // inactive: 1 <----- metric incorrect // active: -1 <----- metric incorrect Thread 1: TransactionParticipant::unstashTransactionResources // inactive: 0 <----- metric remedied // active: 0 <----- metric remedied
However, if TransactionParticipant::unstashTransactionResources throws before calling TransactionMetricsObserver:onUnstash, for example by timing out waiting to acquire the GlobalLock, then the inactive and active counts may remain permanently incorrect.