-
Type: Improvement
-
Resolution: Won't Fix
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
WiredTiger currently uses a linear implementation of hardware-accelerated crc32c. Intel CPUs have 3 execution units capable of executing crc32c instructions at the same time so they recommend having 3 in flight to get the most throughput. Intel's benchmarks show a 2.6x speedup from this.
Linux includes a BSD-licensed implementation of this algorithm written by Intel employees that we should consider using: https://github.com/torvalds/linux/blob/master/arch/x86/crypto/crc32c-pcl-intel-asm_64.S