Session information is stored in the system.sessions collection in the config database. Information about active sessions is cached in the LogicalSessionCache. The cache is periodically refreshed, which both
- kills cursors inside sessions that are no longer present in the underlying system.sessions collection, and
- flushes new cached session information out to system.sessions.
Suppose that a cache refresh is happening concurrently with a startSession command. It is possible for a session's cursor to be unexpectedly killed out from under the client's feet if the session record has not yet been written out to the system.sessions collection. The cache refresh code attempts to write new sessions out to system.sessions prior to killing any cursors. However, there is no synchronization to ensure that in between writing out these new sessions and killing cursors, a new session does not come into being. This means that the following can take place:
- A cache refresh begins, and active cache entries are written to system.sessions.
- A new session is started and enters the LogicalSessionCache. A cursor is opened inside this session.
- The refresh code notices that a there is a session with a cursor which is not represented in system.sessions. It kills the cursor, despite the cursor still being in use by the client and the session still being alive.
Fix Implementation
The issue is caused by a race in LogicalSessionCache.
If method LogicalSessionCacheImpl::_addToCache https://github.com/mongodb/mongo/blob/r4.1.0/src/mongo/db/logical_session_cache_impl.cpp#L392
adds session between https://github.com/mongodb/mongo/blob/r4.1.0/src/mongo/db/logical_session_cache_impl.cpp#L333 and https://github.com/mongodb/mongo/blob/r4.1.0/src/mongo/db/logical_session_cache_impl.cpp#L357 then it considered removed because its not in the sessions collection and get killed
To fix the sessions freshly added to the activeSessions set in the _addToCache method must have an attribute that indicates if they were synched with the sessions collections. Initially its false and once the refreshSessions is called its true.
Hence findRemovedSessions must only look at the sessions that have this attribute set to true.
- is depended on by
-
NODE-1482 Cursor not found issue
- Closed
- is duplicated by
-
SERVER-34053 Cursor not found error when running long query on secondary with noCursorTimeout
- Closed
-
SERVER-35484 Active cursor with Session disappears after a few minutes
- Closed
- is related to
-
SERVER-36808 Server closes cursors that are still in use during session cache refresh
- Closed