-
Type: Task
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Networking & Observability
-
0
Today when running in pinning mode, the PinnedConnectionTaskExecutor that TaskExecutorCursor uses schedules a callback that captures a reference to itself on the underlying executor, to ensure that it is destroyed on the underlying executor.
We extend the lifetime of the PinnedConnectionTaskExecutor in this way to allow a best-effort killCursors we send through the PinnedExecutor to run on the pinned connection (we don't want to block the destruction of the TaskExecutorCursor on this command completing).
However, if the underlying executor is shutdown and rejects work, we will destroy the callback we tried to schedule that destroyes PinnedConnectionTaskExecutor on one of its own threads, which would lead to a deadlock.
This isn't a problem in production today because the search task executors have service context scope , but it makes unittesting harder because we're not resilient to the executor being shut-down (see SERVER-93757 as an example).
It also makes the API buggy because if someone uses TaskExecutorCursor in pinning mode with an executor that could shut-down, we could have the buggy behavior/deadlock described.
At a minimum, we should document and invariant on the requirement that the underlying executor cannot shut down until the callback runs. Ideally, we should refactor the lifetime-extension/destruction of the PinnedExecutor so that we don't rely on another thread from the underlying executor being available at this time.
- related to
-
SERVER-93757 Join all thread in PinnedConnectionTaskExecutorTest tear-down
- Backlog