Inserting a document that creates a large number of index entries can create a large amount of dirty data in a single transaction, causing it to be canceled and retried indefinitely, resulting in a hang.
For example on a node with a 256 MB cache, create a text index then insert a document with a large string to be indexed, or equivalently a lot of terms to be indexed:
function repro() { db.c.drop() printjson(db.c.createIndex({x: "text"})) doc = {x: []} for (var j = 0; j < 50000; j++) doc.x.push("" + Math.random() + Math.random()) for (var i = 0; i < 20; i++) { start = new Date() db.c.insert(doc) print(new Date() - start, "ms") } }
This will hang after a few documents, with high cache pressure, and the following emited repeatedly in the log:
{"t":\{"$date":"2021-12-03T11:43:20.820-05:00"},"s":"I", "c":"STORAGE", "id":22430, "ctx":"conn21","msg":"WiredTiger message","attr":\{"message":"oldest pinned transaction ID rolled back for eviction"}}
This will effectively make the server inoperational due to cache pressure. If it occurs on the secondaries they will stall because it will prevent completion of the current batch.
This is a regression as these inserts complete successfully (even if somewhat slowly) in 4.2.
I think this is related to SERVER-61454, but I'm opening this as a distinct ticket because
- This is a somewhat different use case as the issue can be reliably created with single inserts.
- I don't think the change described in
SERVER-61454would apply here, as the insert is the only transaction running so delaying retries would have no effect, and the issue is not related to CPU resource starvation as far as I can tell. - It's not clear to me where the appropriate fix would lie - query layer, retry behavior, storage engine behavior.
- depends on
-
WT-9879 Fix overcounting of session txn dirty bytes statistic
- Closed
-
WT-10027 Session txn dirty statistic is incorrect
- Closed
-
SERVER-68739 Add WiredTiger session statistic without affecting slow op statistics
- Closed
-
WT-8848 Add API to roll back and indicate that a transaction has exceeded a configurable limit of pinned dirty data
- Closed
- is depended on by
-
SERVER-69480 TransientTransactionError label potentially applied incorrectly
- Closed
-
COMPASS-6311 Investigate changes in SERVER-61909: Hang inserting or deleting document with large number of index entries
- Closed
-
TOOLS-3223 Investigate changes in SERVER-61909: Hang inserting or deleting document with large number of index entries
- Closed
- is related to
-
SERVER-60839 Introduce a TemporarilyUnavailable error type
- Closed
-
SERVER-61454 Change retry policy when txns are rolled back for eviction
- Closed
- related to
-
SERVER-71750 Revert refactor into handleWriteConflictException in writeConflictRetry loop
- Closed
-
SERVER-71751 Skip transaction_too_large_for_cache.js for in-memory variants
- Closed
-
WT-8290 Adding a new API to the session to return the rollback reason
- Closed