-
Type: Task
-
Resolution: Won't Do
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Sharding
-
Sharding NYC
As part of my work for SERVER-44409, I ran into many ConflictingOperationInProgress errors, e.g.:
[fsm_workload_test:CRUD_and_commands] 2020-03-09T18:19:52.516+0000 Foreground jstests/concurrency/fsm_workloads/CRUD_and_commands.js [fsm_workload_test:CRUD_and_commands] 2020-03-09T18:19:52.516+0000 Error: command failed: { [fsm_workload_test:CRUD_and_commands] 2020-03-09T18:19:52.516+0000 "ok" : 0, [fsm_workload_test:CRUD_and_commands] 2020-03-09T18:19:52.516+0000 "errmsg" : "unable to initialize targeter for write op for collection test18_fsmdb0.fsmcoll0 :: caused by :: No chunks were found for the collection", [fsm_workload_test:CRUD_and_commands] 2020-03-09T18:19:52.516+0000 "code" : 117, [fsm_workload_test:CRUD_and_commands] 2020-03-09T18:19:52.516+0000 "codeName" : "ConflictingOperationInProgress", [fsm_workload_test:CRUD_and_commands] 2020-03-09T18:19:52.516+0000 "operationTime" : Timestamp(1583777991, 80), [fsm_workload_test:CRUD_and_commands] 2020-03-09T18:19:52.517+0000 "$clusterTime" : { [fsm_workload_test:CRUD_and_commands] 2020-03-09T18:19:52.517+0000 "clusterTime" : Timestamp(1583777991, 80), [fsm_workload_test:CRUD_and_commands] 2020-03-09T18:19:52.517+0000 "signature" : { [fsm_workload_test:CRUD_and_commands] 2020-03-09T18:19:52.517+0000 "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="), [fsm_workload_test:CRUD_and_commands] 2020-03-09T18:19:52.517+0000 "keyId" : NumberLong(0) [fsm_workload_test:CRUD_and_commands] 2020-03-09T18:19:52.517+0000 } [fsm_workload_test:CRUD_and_commands] 2020-03-09T18:19:52.517+0000 } [fsm_workload_test:CRUD_and_commands] 2020-03-09T18:19:52.517+0000 }
I encountered this error both inside transactions and outside of transactions. Per a discussion with jack.mulrow, in an aggressive concurrency workload with dropCollection in parallel with CRUD ops in sharding suites, it is possible to run into this kind of error even though CRUD ops and dropCollection take conflicting locks.
Can we consider adding a TransientTransactionError label when we encounter this error, to facilitate retrying? Conceptually, this seems like a similar case to the existing TransientTransactionError cases.