-
Type: Improvement
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
Currently, we have a timeout mechanism to rollback the operations that take too long, which may leads to cache stuck. The way it works is that user can set a default timeout in wiredtiger open and the default timeout will be applied to all api calls, including the cursor operations, session operations like verify, salvage, checkpoint, and etc., and connection operations like rollback to stable, and etc. The default timeout can be overriden by a timeout at the transaction level. However, if the transaction level timeout is not specified, the default timeout is applied.
This mechanism is initially designed to help reduce the cache stuck in test/format caused by slow cursor operations so the default timeout is set to a relatively small value 2 seconds.
Since this default timeout also applies to other session operations, which usually takes substantially more time to finish, like salvage, checkpoint, verify, and etc., the timeout mechanism may result to unexpected failures for these operations.
We need to review the design of the timeout mechanism to explore other alternatives and in the meantime minimize the impact to the api to keep the new mechanism compatible to the old api.
Potentially we still want to keep the global timeout setting to reduce the needs for the user to specify the timeout at the individual operation level. To improve the current mechanism, we may want to make the global timeout only applying to some of the cursor operations instead of all api operations.
- related to
-
WT-6802 Don't set operation timer for internal and reentry api calls
- Closed