Byte count has poor codegen with autovectorization #136500
Labels
A-autovectorization
Area: Autovectorization, which can impact perf or code size
A-LLVM
Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.
C-optimization
Category: An issue highlighting optimization opportunities or PRs implementing such
T-compiler
Relevant to the compiler team, which will review and decide on the PR/issue.
I'm currently writing some code that counts the amount of a certain byte (newlines in this example) in a large byte slice (>1GB):
When compiled with
-C target-feature=+avx2
, avx2 instructions are emitted from autovectorization, but is still around 2x slower than bytecount.Using
portable_simd
, the code can be made faster:This has similar performance with
bytecount::count
.https://rust.godbolt.org/z/6b5bTKoa9
The text was updated successfully, but these errors were encountered: