-
Type: Task
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
StorEng - Refinement Pipeline
mathias@mongodb.com pointed out that we could utilize a feature of the Intel X86 CPU implementation to optimize our crc32 implementation.
On x86 (at least on intel) the story is a bit different. There, the crc32c instruction has a 3 cycle latency, but 3 can execute in parallel as long as they are independent, so the standard pattern is to do each chunk as 3 parallel streams and do a merge operation. See WT-2121 for some discussion. It may be worth reopening that and trying again since you are already set up to test it. You can find a BSD-or-GPL licensed implementation written by intel engineers linked to from that ticket, but to save you a hop, here you go.
This ticket is to explore the feasibility of this option and see how much performance gain we can potentially achieve.
- related to
-
WT-12011 Speed up crc32c on arm64
- Closed