-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Cluster Scalability
-
Fully Compatible
-
ALL
-
v8.0
-
Cluster Scalability 2024-07-22
-
200
The reason behind the bug is:
As part of chunk migration MigrationDestinationManager::_migrateDriver() is called. It runs with a logical session checked out and uses retryable writes for replay protection. At the first step it is performing the following: "Ensure any data which might have been left orphaned in the range being moved has been deleted.". This is done by calling rangedeletionutil::checkForConflictingDeletions(), which in itself runs PersistentTaskStore.count(), which is creating a find command and calls itcount() on it, which will call find and then subsequent getMores(), exhausting the cursor and this is where the getMore() is failing with the txnNumber error.
PersistentTaskStore.count() uses DBDirectClient which is an internal C++ API, which leads to mongod behaving as-if the find and getMore commands received a txnNumber without being part of a multi-statement transaction.
This was not an issue before changes to DBDirectClient was introduced as part of SERVER-88895.
Possible solutions could be:
- Wrap the call to rangedeletionutil::checkForConflictingDeletions() in MigrationDestinationManager inside a runWithoutSession() call
- SERVER-77332
- SERVER-91662
- Revert
SERVER-88895?
- is related to
-
SERVER-88895 Cursor contains txnNumber when created from a retryable write (BulkWrite)
- Closed
- related to
-
SERVER-92480 Unset txn state from opCtx when checking session in
- Backlog