-
Type: Task
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Sharding
-
Query Optimization
-
Fully Compatible
-
v8.0, v7.3, v7.0, v6.0, v5.0
-
QO 2024-07-22, QO 2024-08-05
-
(copied to CRM)
-
33
-
2
The txn_recover_decision_using_recovery_router.js test runs transactions without any transient error retry logic. It is possible for a shard version refresh to be triggered by a secondary running _flushRoutingTableCacheUpdates against the primary of the replica set shard. The shard version refresh will take CollectionShardingRuntime::_stateChangeMutex (aka CSRLock) in MODE_X and potentially cause the transaction to time out with a LockTimeout.
[js_test:txn_recover_decision_using_recovery_router] s23547| 2022-08-25T09:04:15.033+00:00 D3 EXECUTOR 22607 [conn12] "Scheduling remote command request","attr":{"request":"RemoteCommand 115 -- target:[ip-10-128-58-199:23542] db:test cmd:{ find: \"user\", filter: { x: 1.0 }, shardVersion: { t: Timestamp(1661418244, 76), e: ObjectId('63073b04ea5d5cbf63d64294'), v: Timestamp(2, 0) }, txnNumber: 15, clientOperationKey: UUID(\"3f4bbcec-212d-4c25-a292-2f0aa0312afe\"), readConcern: {}, startTransaction: true, autocommit: false, lsid: { id: UUID(\"c66be7c9-c7f6-450e-9846-ec23ef55a7fe\"), uid: BinData(0, E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855) } }"} ... [js_test:txn_recover_decision_using_recovery_router] d23542| 2022-08-25T09:04:15.034+00:00 D4 TXN 23984 [conn50] "New transaction started","attr":{"txnNumberAndRetryCounter":{"txnNumber":15,"txnRetryCounter":0},"lsid":{"id":{"$uuid":"c66be7c9-c7f6-450e-9846-ec23ef55a7fe"},"uid":{"$binary":{"base64":"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=","subType":"0"}}},"apiParameters":{}} [js_test:txn_recover_decision_using_recovery_router] d23542| | 2022-08-25T09:04:15.039+00:00 I SH_REFR 4619901 [CatalogCache-1] "Refreshed cached collection","attr":{"namespace":"test.user","lookupSinceVersion":"2|1||63073b04ea5d5cbf63d64294||Timestamp(1661418244, 76)","newVersion":"{ chunkVersion: { t: Timestamp(1661418244, 76), e: ObjectId('63073b04ea5d5cbf63d64294'), v: Timestamp(2, 1) }, forcedRefreshSequenceNum: 11, epochDisambiguatingSequenceNum: 10 }","timeInStore":"{ chunkVersion: \"None\", forcedRefreshSequenceNum: 10, epochDisambiguatingSequenceNum: 9 }","durationMillis":6} [js_test:txn_recover_decision_using_recovery_router] d23542| 2022-08-25T09:04:15.039+00:00 I TXN 51802 [conn50] "transaction","attr":{"parameters":{"lsid":{"id":{"$uuid":"c66be7c9-c7f6-450e-9846-ec23ef55a7fe"},"uid":{"$binary":{"base64":"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=","subType":"0"}}},"txnNumber":15,"txnRetryCounter":0,"autocommit":false,"readConcern":{"provenance":"implicitDefault"}},"readTimestamp":"Timestamp(0, 0)","terminationCause":"aborted","timeActiveMicros":5334,"timeInactiveMicros":24,"numYields":0,"locks":{"FeatureCompatibilityVersion":{"acquireCount":{"w":1}},"ReplicationStateTransition":{"acquireCount":{"w":3}},"Global":{"acquireCount":{"w":1}},"Database":{"acquireCount":{"w":1}},"Collection":{"acquireCount":{"w":1}},"Mutex":{"acquireCount":{"r":2},"acquireWaitCount":{"r":1},"timeAcquiringMicros":{"r":5127}}},"storage":{},"wasPrepared":false,"durationMillis":5} ... [js_test:txn_recover_decision_using_recovery_router] s23547| 2022-08-25T09:04:15.040+00:00 D2 ASIO 22597 [conn12] "Request finished with response","attr":{"requestId":115,"isOK":true,"response":"{ errorLabels: [ \"TransientTransactionError\" ], ok: 0.0, errmsg: \"Unable to acquire IS lock on '{11529215046068469802: Mutex, 42, test.user}' within 5ms.\", code: 24, codeName: \"LockTimeout\", lastCommittedOpTime: Timestamp(1661418253, 1), $clusterTime: { clusterTime: Timestamp(1661418255, 1), signature: { hash: BinData(0, 0000000000000000000000000000000000000000), keyId: 0 } }, $configTime: Timestamp(1661418253, 2), $topologyTime: Timestamp(1661418244, 11), operationTime: Timestamp(1661418255, 1) }"}
The txn_recover_decision_using_recovery_router.js test should be updated to either (a) use the retryOnceOnTransientOnMongos() helper for each of its test cases or (b) consider raising the value of the maxTransactionLockRequestTimeoutMillis server parameter to avoid the transaction timing out due to a LockTimeout.
- is related to
-
SERVER-60758 Prevent dbVersion refreshes from failing transactions in txn_recover_decision_using_recovery_router.js
- Closed
- related to
-
SERVER-90149 Increase lock timeout in update_shard_key_bulk_write.js
- Closed