There is a bug with our RecordId initialization that is more generally described by SERVER-61116. As a consequence, very large multi-document transactions that consume most of cache can deadlock. In practice, this has to be the first transaction to write to a given collection.
We create a new WT_SESSION to call largest_key() to lazily initialize the highest RecordId for a collection (as of SERVER-58409). We can do this while holding hostage another session that is pinning a large amount of data in the cache. If this large transaction is pinning enough data, then the largest_key() call can block, but the session pinning that content cannot be rolled-back because it is held by the same thread.
We should use an "operation_timeout_ms" here, as we did in SERVER-61097. This will cause the operation to receive a WT_ROLLBACK after a period of time, which we should throw back to the parent operation to retry.
- is caused by
-
SERVER-58409 Startup RecordId initialization is flawed with durable history and reconstructing prepared transactions
- Closed
- related to
-
SERVER-60839 Introduce a TemporarilyUnavailable error type
- Closed
-
SERVER-61116 Audit and add assertions against using multiple WT_SESSIONs on the same thread
- Backlog