-
Type: Bug
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Server Programmability
-
Service Arch 2022-06-13, Service Arch 2022-06-27, Service Arch 2022-07-11
-
7
Some places in the code rely on catching an exception in ErrorCategory::Interruption to check whether an OperationContext has been interrupted. This is a problem if any callsites ever call OperationContext::markKilled with an error code that isn't in this error category, but there's not currently anything preventing that from happening. Technically, using the Interruption category to check for OperationContext interrupt is also error prone since it's possible that other things could throw an Interruption error, so call sites which need to check for interrupt should probably catch all exceptions and then actually check the OperationContext to see if it's been interrupted.
This ticket should either:
- Add an invariant to markKilled to make sure all error codes passed to it belong to ErrorCategory::Interruption, and fix the broken call sites, or
- Fix all places we catch Interruption errors and rely on that to assume the OperationContext has been interrupted to catch all DBExceptions and check the OperationContext itself for interrupt
—
We determined that the appropriate resolution is to remove the Interruption category. The reason being Interruption has built up a lot of cruft and has lost meaning over time. Currently, several bugs have been logged to address sections of the code base that use this category. Once those are complete, the actual removal of the category should be trivial.
- depends on
-
SERVER-67606 Stop using ErrorCategory::Interruption in Server tests
- Closed
-
SERVER-67611 Stop using ErrorCategory::Interruption in Execution codebase
- Closed
-
SERVER-67615 Stop using ErrorCategory::Interruption in Query codebase
- Closed
-
SERVER-67617 Stop using ErrorCategory::Interruption in Replication codebase
- Closed
-
SERVER-67618 Stop using ErrorCategory::Interruption in Sharding codebase
- Closed
- is related to
-
SERVER-70010 Stop using getKillStatus to check for OperationContext interruption.
- Closed
-
SERVER-55323 Integrate CancelableOperationContext into ReshardingOplogApplier
- Closed
-
SERVER-78722 CancelableOperationContext::cancel() should pass a Status through to operations checking for interrupt, rather than a generic ErrorCodes::Interrupted & "operation was interrupted"
- Open
- related to
-
SERVER-55379 Invariant failure _requests.empty() at src/mongo/db/concurrency/lock_state.cpp 289
- Closed
-
SERVER-60685 TransactionCoordinator may interrupt locally executing update with non-Interruption error category, leading to server crash
- Closed
-
SERVER-68808 Catch CancellationError in ReplClientInfo
- Closed
-
SERVER-85912 Audit code paths explicitly checking for ErrorCodes::Interrupted
- Backlog