Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Done
Fix Version/s: WT2.2.1
Affects Version/s: None
Component/s: None
Labels:
None

Sprint:
None
Story Points:
None

I've been investigating the long latencies with basho_bench and riak. I have found that the underlying cause of quite a number of them appears to be snappy compression and decompression. Somehow we appear to be able to get snappy into a very slow state. I've been recording, in the thin WT layer, any time a call to snappy_compress or snappy_uncompress takes longer than 0.75 seconds, and aborting if it takes > 1 second. On one run, I had 9 entries before it aborted. Each one was a decompress, all between 12280-12290 bytes in (compressed) length and the recorded times were taking progressively longer, from 0.75sec until it aborted > 1sec. Another run, I only had the entry for the aborting thread where a compress took 1.02 seconds to compress 98220 bytes down to 48222 bytes.

Currently I bumped the abort threshold to 1.5 seconds just to see how many entries I collect before aborting. I'm going to look at the snappy source that we build into wterl/riak to see if there is anything obvious there. It is using 1.0.4.

This performance issue is reproducible 100% of the time running the fruitpop followed by fruitload tests on the AWS SSD box.