Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.9.0, 4.2.12, 4.0.23, 4.4.4
Affects Version/s: 4.4.1, 4.2.10, 4.0.21, 3.6.21
Component/s: None
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v4.4, v4.2, v4.0
Sprint:
Execution Team 2021-01-11
Case:
Confidence Status:
None
Work Order:
0

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

Test creates 1000 collections with 10 secondary indexes each (total 12 k tables) then does inserts at a rate of 1000 per second on 100 threads (10 per second for each thread) against randomly chosen collections. Latency is generally <1 ms but FTDC shows periodic latency spikes of tens of ms:

The spikes coincide with calls to WiredTigerSessionCache::closeExpiredIdleSessions, and the interval between spikes is wiredTigerSessionCloseIdleTimeSecs, which has been set to 30 in the above test. I believe this is what's happening:

closeExpiredIdleSessions takes a mutex on the WT session cache while closing idle sessions; this delays all operations.
Because of this up to 100 (number of client threads) operations queue up on the mutex and then proceed concurrently after closeExpiredIdleSessions completes, requiring up to 100 concurrent sessions to complete those operations; after completion those sessions are cached.
Between expiration intervals operations complete quickly so concurrency is low, so only a small number of the cached sessions are actually used.
So the next call to closeExpiredIdleSessions after the expiration interval finds that most of the cached sessions are idle, so it closes them while holding the session cache mutex, again delaying up to 100 operations and requiring them to open up to 100 new sessions after closeExpiredIdleSessions finishes closing the 100 old sessions, repeating the cycle.

This causes the pattern of cached cursor count and open sessions that we see in the FTDC above; with the 1-second sample interval of FTDC we only occasionally catch the number of sessions decreasing due to closeExpiredIdleSessions before immediately increasing again, e.g. at A in the chart above.

The magnitude of this effect increases with more tables; perf data shows that closeExpiredIdleSessions spends about 65% of its time in close_cached_cursors and about 25% in __session_discard_dhandle, both of which are proportional to the number of tables.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

after.png
67 kB
Jan 07 2021 02:45:08 PM UTC
before.png
67 kB
Jan 07 2021 02:45:02 PM UTC
repro.js
1 kB
Nov 16 2020 03:15:46 PM UTC
spikes.png
122 kB
Nov 16 2020 03:13:57 PM UTC

Assignee:: Gregory Wlodarek

Reporter:: Bruce Lucas (Inactive)

Participants:: Bruce Lucas, Githook User, Gregory Wlodarek

Votes:: 0 Vote for this issue

Watchers:: 25 Start watching this issue

Created:: Nov 16 2020 03:15:10 PM UTC

Updated:: Aug 05 2024 10:43:26 PM UTC

Resolved:: Jan 07 2021 07:08:33 PM UTC

Confidence Status Last Update:: 06/Jan/21 3:30 PM

Details

Description

Attachments

Attachments

Activity

People

Dates