-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: 1.4.0
-
Component/s: None
-
None
-
Environment:PHPC 1.4.2, PHPLIB 1.3.1, PHP 7.1.15
MongoDB 3.6.3 replica set using FCV 3.6 and authentication
In mongodb/mongo-php-library#517, a user reported a case where the getMore for an aggregate command produces the following server error:
Cursor session id (142df95e-b30c-409b-ba3c-213d19c8bbee - TDTj2DYnZ/sCdyr5i0hFB2XrtJGBm4wdvewCHRKZkn0=) is not the same as the operation context's session id (55fa9062-7ad7-4416-a169-4b10415a3e5e - TDTj2DYnZ/sCdyr5i0hFB2XrtJGBm4wdvewCHRKZkn0=)
This error was alluded to in CDRIVER-2506, although that issue was resolved in libmongoc 1.9.3, which is included in our 1.4.2 release. As the user states, this error only appears with the aggregate command. I was also able to reproduce this in a local environment, so perhaps CDRIVER-2506 did not fully resolve the issue (or there was a problem with the backport to 1.9.x).
Consider the user's script:
$client = new \MongoDB\Client(REPLICASET_AUTH_URI); // test.test collection contains > 101 documents $cursor = $client->test->test->find(); $recordsScanned = 0; foreach ($cursor as $row) { ++$recordsScanned; echo 'A:' . $recordsScanned . "\n"; } $cursor = $client->test->test->aggregate([]); $recordsScanned = 0; foreach ($cursor as $row) { ++$recordsScanned; echo 'B:' . $recordsScanned . "\n"; }
The had the following observations:
- The error still happens if the first query find() is replaced with aggregate()
- The error does not happen if the second query aggregate() is replaced with find()
- The error does not happen if the second query stores result into a differently named variable (e.g. $cursor2 instead of $cursor)
I believe the third bullet is related to the first cursor being freed and releasing the implicit session back to the LIFO session pool. When different variable names are used for each cursor, the find cursor is not freed until the end of the script and its session remains checked out of the pool. That allows the aggregate cursor to select a new session from the pool and re-use that session for its getMore.
When both cursor variables use the same name, the first cursor is not freed until after the second cursor is already assigned. This means that the aggregate command gets a new session from the pool; however, its getMore operation will then get the session originally used for the first operation (it was more recently returned to the pool – LIFO).
In my own testing, I saw the error occur even if both operations used explicit sessions (i.e. "session" option provided) sessions. This revealed what I believe to be a root cause of the issue: the getMore for aggregate is not guaranteed to reuse the same session as its original original aggregate command. This appears to be unique for aggregate (which we execute using executeCommand() and friends). I could not reproduce the issue with find commands (which use executeQuery()).
Consider the following script (where "test.test" contains > 101 documents):
class SessionLogger implements MongoDB\Driver\Monitoring\CommandSubscriber { public function commandStarted(MongoDB\Driver\Monitoring\CommandStartedEvent $event) { if (isset($event->getCommand()->lsid)) { printf("%s command with lsid: %s\n", $event->getCommandName(), bin2hex((string) $event->getCommand()->lsid->id)); } } public function commandSucceeded(MongoDB\Driver\Monitoring\CommandSucceededEvent $event) { } public function commandFailed(MongoDB\Driver\Monitoring\CommandFailedEvent $event) { } } MongoDB\Driver\Monitoring\addSubscriber(new SessionLogger); $client = new \MongoDB\Client(REPLICASET_AUTH_URI); echo "Testing find commands with implicit sessions\n"; $cursor = $client->test->test->find([]); $cursor->toArray(); $cursor = $client->test->test->find([]); $cursor->toArray(); echo "\nTesting find commands with explicit sessions\n"; $session = $client->startSession(); printf("Explicit session with lsid: %s\n", bin2hex((string) $session->getLogicalSessionId()->id)); $cursor = $client->test->test->find([], ['session' => $session]); $cursor->toArray(); $cursor = $client->test->test->find([], ['session' => $session]); $cursor->toArray(); echo "\nTesting aggregate commands with implicit sessions\n"; $cursor = $client->test->test->aggregate([]); try { $cursor->toArray(); } catch (MongoDB\Driver\Exception\RuntimeException $e) { printf("RuntimeException(%d): %s\n", $e->getCode(), $e->getMessage()); } $cursor = $client->test->test->aggregate([]); try { $cursor->toArray(); } catch (MongoDB\Driver\Exception\RuntimeException $e) { printf("RuntimeException(%d): %s\n", $e->getCode(), $e->getMessage()); } echo "\nTesting aggregate commands with explicit sessions\n"; $session = $client->startSession(); printf("Explicit session with lsid: %s\n", bin2hex((string) $session->getLogicalSessionId()->id)); $cursor = $client->test->test->aggregate([], ['session' => $session]); try { $cursor->toArray(); } catch (MongoDB\Driver\Exception\RuntimeException $e) { printf("RuntimeException(%d): %s\n", $e->getCode(), $e->getMessage()); } $cursor = $client->test->test->aggregate([], ['session' => $session]); try { $cursor->toArray(); } catch (MongoDB\Driver\Exception\RuntimeException $e) { printf("RuntimeException(%d): %s\n", $e->getCode(), $e->getMessage()); }
This produces the following output:
Testing find commands with implicit sessions find command with lsid: f8fedb095d6a4985af13b53a425c8d32 getMore command with lsid: f8fedb095d6a4985af13b53a425c8d32 find command with lsid: 625e9fdc4d3348e08cf1e9e359ba17a1 getMore command with lsid: 625e9fdc4d3348e08cf1e9e359ba17a1 Testing find commands with explicit sessions Explicit session with lsid: f8fedb095d6a4985af13b53a425c8d32 find command with lsid: f8fedb095d6a4985af13b53a425c8d32 getMore command with lsid: f8fedb095d6a4985af13b53a425c8d32 find command with lsid: f8fedb095d6a4985af13b53a425c8d32 getMore command with lsid: f8fedb095d6a4985af13b53a425c8d32 Testing aggregate commands with implicit sessions aggregate command with lsid: 625e9fdc4d3348e08cf1e9e359ba17a1 getMore command with lsid: 625e9fdc4d3348e08cf1e9e359ba17a1 aggregate command with lsid: 11e88ca9f7b548d79e8e95511a1e6a22 getMore command with lsid: 625e9fdc4d3348e08cf1e9e359ba17a1 RuntimeException(13): Cursor session id (11e88ca9-f7b5-48d7-9e8e-95511a1e6a22 - pzarhKY2owve6qhN2jyojBF8O3wMotRKbXhCmcmZSlE=) is not the same as the operation context's session id (625e9fdc-4d33-48e0-8cf1-e9e359ba17a1 - pzarhKY2owve6qhN2jyojBF8O3wMotRKbXhCmcmZSlE=) Testing aggregate commands with explicit sessions Explicit session with lsid: 11e88ca9f7b548d79e8e95511a1e6a22 aggregate command with lsid: 11e88ca9f7b548d79e8e95511a1e6a22 killCursors command with lsid: 625e9fdc4d3348e08cf1e9e359ba17a1 getMore command with lsid: 625e9fdc4d3348e08cf1e9e359ba17a1 RuntimeException(13): Cursor session id (11e88ca9-f7b5-48d7-9e8e-95511a1e6a22 - pzarhKY2owve6qhN2jyojBF8O3wMotRKbXhCmcmZSlE=) is not the same as the operation context's session id (625e9fdc-4d33-48e0-8cf1-e9e359ba17a1 - pzarhKY2owve6qhN2jyojBF8O3wMotRKbXhCmcmZSlE=) aggregate command with lsid: 11e88ca9f7b548d79e8e95511a1e6a22 killCursors command with lsid: 625e9fdc4d3348e08cf1e9e359ba17a1 getMore command with lsid: 625e9fdc4d3348e08cf1e9e359ba17a1 RuntimeException(13): Cursor session id (11e88ca9-f7b5-48d7-9e8e-95511a1e6a22 - pzarhKY2owve6qhN2jyojBF8O3wMotRKbXhCmcmZSlE=) is not the same as the operation context's session id (625e9fdc-4d33-48e0-8cf1-e9e359ba17a1 - pzarhKY2owve6qhN2jyojBF8O3wMotRKbXhCmcmZSlE=) killCursors command with lsid: 625e9fdc4d3348e08cf1e9e359ba17a1
The killCursors output is the result of a previous cursor object being freed because it has a non-zero cursor ID. You'll note that the session ID used for killCursors corresponds to the previous getMore responsible for the preceding exception.
Here we see that find cursors have no problem reusing the correct session ID for their getMore commands; however, aggregate cursors seem to check a new implicit session out of the LIFO pool each time, regardless of whether the original aggregate used an implicit or explicit session.
- is depended on by
-
PHPLIB-342 Change streams should use the same session when resuming
- Closed
- is related to
-
PHPC-1151 Possible segfault is Session object is freed before Cursor
- Closed
-
CDRIVER-2506 getMore doesn't always use same implicit session as find/aggregate/etc.
- Closed
-
CDRIVER-2578 mongoc_change_stream_t can send getMore with wrong lsid
- Closed
- related to
-
PHPC-1163 Do not allow a session to be used with an unacknowledged write concern
- Closed
-
PHPC-1161 Free reference to Session once Cursor is exhausted
- Closed
- links to