-
Type: Improvement
-
Resolution: Won't Fix
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Btree
-
Storage Engines
-
8
-
2024-02-20_A_near-death_puffin, 2024-03-05 - Claronald, 2024-03-19 - PacificOcean, 2024-04-02 - GreatMugshot, 나비 (nabi) - 2024-04-16
I noticed that this code tries to use arm neon SIMD to compare 16 bytes at a time. However, it only does it when that arguments are aligned, even though the simd load instructions have no alignment requirements. At the very least we should remove the alignment check.
But I think we should do much more.
- When doing only 16 bytes at a time, it should be faster in regular GPRs rather than SIMD since arm has an instruction to load 16 bytes into a pair of GPRs.
- The tail bytes should be handled by loading the last 16 bytes, rather than byte-at-a-time
- Once we've found a difference, we should compute matchp and the compare results from the loaded data rather than using the byte loop.
- Use at most 2 loads when len < 16 (and ignore matchp since it isn't worth it in that case)
- Try to nudge the compiler into using conditional instructions such as CSET/CSEL/CSINC rather than branching for unpredictable branchs to avoid mispredict penalty
I've put this all together in this godbolt. Clang seems to do slightly better than gcc, but gcc isn't bad with this code.
The x86 path could also use a review. It could use some of these techniques, but the >=16 byte path should just use vectors since they are cheaper there. And we should eliminate the alignment check there too since the perf advantage of movdqa over movdqu has disapeared on modern CPUs.
- duplicates
-
WT-12764 failed: cppsuite-bounded-cursor-prefix-indices-stress on ubuntu2004-stress-nonstandalone [wiredtiger @ bcad797a]
- Closed
- is depended on by
-
WT-12667 Investigate reducing memcpys in __lex_compare_lt_16
- Closed
- is related to
-
WT-12841 Simplify the lex compare code with standard memcmp function
- Open
- related to
-
WT-11903 Optimize WT short key comparison function __wt_lex_compare_short
- Closed