-
Type: Bug
-
Resolution: Fixed
-
Priority: Critical - P2
-
Affects Version/s: 4.0.0
-
Component/s: Querying
-
Fully Compatible
-
ALL
-
v4.2, v4.0
-
Query 2019-04-08, Query 2019-04-22, Query 2019-05-06, Query 2019-05-20, Query 2019-06-03, Query 2019-06-17, Query 2019-07-01, Query 2019-07-15, Query 2019-07-29, Query 2019-08-12
-
30
bool isKillPending() const { // A cursor is kill pending if it's checked out by an OperationContext that was // interrupted. return _operationUsingCursor && !_operationUsingCursor->checkForInterruptNoAssert().isOK(); }
is calling checkForInterruptNoAssert.
That's a problem because:
- the rule for opCtx is that the owning thread can call all methods, and external threads can call some methods, while holding the client lock
- isKillPending is called from ClusterCursorManager::stats, which is called by ftdc, which doesn't hold any client locks
- checkForInterruptNoAssert isn't meant to be called from off thread. opCtx->isKillPending() is the way to do that
- checkForInterrupt can actually invoke markKilled, if the op is now timed out
- markKilled does a dance with waitForConditionOrInterrupt that, when the latter is active, does:
- an unlock of the client lock. This probably kicks the system into an unanticipated state
- a lock of the mutex specified in waitForConditionOrInterrupt
- a lock of the client lock
- Setting the killcode
- unlocking the mutex specified in waitForConditionOrInterrupt
I think this doesn't usually come up because the ops using cluster cursors usually haven't exceeded maxtimems. That may be an avenue for an isolated repro.
- is duplicated by
-
SERVER-40100 Mongos hangs without any log messages
- Closed
- related to
-
SERVER-40100 Mongos hangs without any log messages
- Closed