Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Fixed
Priority: Major - P3
Fix Version/s: WT2.9.3, 3.2.15, 3.4.6, 3.5.9
Affects Version/s: None
Component/s: None
Labels:
None

Sprint:
Storage 2017-06-19
Story Points:
None
Case:

In ~~WT-3207~~ we fixed a situation where a thread could spin on a handle lock during checkpoints (including while holding the schema lock, blocking many other operations).

It appears that there may be some similar (but less common) source of stalls during checkpoints in a recent case with the fix for ~~WT-3207~~ in place.

bruce.lucas commented:

in every case there was a failed table drop and resulting closing of all cursors, and then a stall until the end of the checkpoint.

the stall coincides with very high cpu utilization and context switch rate, and notably 3 M "pthread mutex shared lock write-lock calls" per second for the duration of the stall.

unlike before - "time waiting for the table lock" never budges from 0 so I guess that counter is no longer hooked up in the patch build?

Looking at the code for that counter one thing that could explain this is a call to __wt_try_writelock in a tight loop. This appears to be a pure CPU loop, i.e. no calls to sched_yield, as we don't see kernel CPU utilization.

Try to reproduce this situation: insert a sleep into checkpoints, run with aggressive sweeping, try a combination of drops, creates and cursor opens. No operation should block for the duration of the checkpoint.

is depended on by

WT-3363 Add test case to detect when drops may be blocked by checkpoints

Closed

is duplicated by

SERVER-29811 extremely slow reads from secondaries after drop of unused indexes

Closed

Assignee:: Michael Cahill (Inactive)
Reporter:: Michael Cahill (Inactive)
Votes:: 0 Vote for this issue
Watchers:: 11 Start watching this issue

Created:: Jun 07 2017 02:32:44 AM UTC
Updated:: Oct 29 2023 04:47:49 PM UTC
Resolved:: Jun 08 2017 04:18:48 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates