Add support for RVV 1.0#542
Conversation
598dea9 to
53ab2ed
Compare
54233d6 to
f46a1cc
Compare
f46a1cc to
60dd32b
Compare
build.rs
Outdated
| // Try rva23u64 first (RVA23 profile includes RVV 1.0), then fall back to rv64gcv. | ||
| // This matches the priority in CMakeLists.txt. | ||
| let march_flag = if build.is_flag_supported("-march=rva23u64").unwrap_or(false) { | ||
| "-march=rva23u64" | ||
| } else { | ||
| "-march=rv64gcv" | ||
| }; |
There was a problem hiding this comment.
rva23u64 has vector extension, but it also has a lot more, just like rv64gcv. I don't think this is the right approach.
My use case could benefit from this PR while the embedded-ish target only implements Zve64x (with E extension too, so there are only 16 general purpose registers too).
Instead of changing the architecture, consider just enabling the minimum required extensions used here and nothing else or in use cases like mine things will blow up in runtime unexpectedly.
| fn is_riscv64() -> bool { | ||
| let arch = &target_components()[0]; | ||
| arch == "riscv64gc" || arch == "riscv64a23" | ||
| } | ||
|
|
||
| fn is_rvv() -> bool { | ||
| // Explicit RVV feature flag | ||
| if defined("CARGO_FEATURE_RVV") { | ||
| return true; | ||
| } | ||
|
|
||
| // riscv64a23 target has built-in RVV support | ||
| let arch = &target_components()[0]; | ||
| if arch == "riscv64a23" { | ||
| return true; | ||
| } | ||
|
|
||
| false | ||
| } |
There was a problem hiding this comment.
This is very limited, I'd be nicer to detect the availability of extensions rather than parsing target triple for just a few known good values. I have a custom target that is called riscv64-unknown-none-abundance and it will support vector extension in the future, but this feature detection will be unable to take advantage of it automatically.
The fact that override exists helps, but it'd be much nicer if feature detection just worked out of the box.
|
@nazar-pc Thanks for reviewing.
I'm currently investigating how to implement a more flexible detection logic that enables RISC-V V extension on supported platforms without bringing in unimplemented instructions. Since neither CMake nor Cargo supports this natively, I may write some ad-hoc Rust/C code to handle it. |
|
@nazar-pc I've updated the detection logic. I also modified the |
839c344 to
34b31d5
Compare
build.rs
Outdated
| if test_rvv_runtime_support() { | ||
| return true; | ||
| } else { | ||
| println!("cargo:warning=No RVV support detected, using portable implementation"); |
There was a problem hiding this comment.
Not sure it is a good idea to print a warning. Many projects reject warnings and this one seems to be unavoidable on RISC-V, which will be a problem for external users.
| let test_code = b" | ||
| #include <riscv_vector.h> | ||
| int main() { | ||
| size_t vl = __riscv_vsetvlmax_e32m1(); |
There was a problem hiding this comment.
This is a single intrinsic. Does it cover all extensions used by this implementation or maybe more of them need to be added?
There was a problem hiding this comment.
This time all functions used in the arch specific code are tested.
build.rs
Outdated
| let cc = env::var("CC").unwrap_or_else(|_| "gcc".to_string()); | ||
| let compile = Command::new(&cc) | ||
| .args(&["-march=rv64gcv", &test_c, "-o", &test_bin]) | ||
| .output(); |
There was a problem hiding this comment.
Why not compiling with https://docs.rs/cc instead?
There was a problem hiding this comment.
cc does not compile C source to binary-it only produces static library.
There was a problem hiding this comment.
but I still use cc to avoid manually setting up env vars.
src/platform.rs
Outdated
| // We use 16 as the upper bound, when future hardwares with degree > 8 released | ||
| // alter this constant |
There was a problem hiding this comment.
Virtual targets often have 2048 bits width, including the one I'm working with. What does this constant impact in practice?
There was a problem hiding this comment.
MAX_SIMD_DEGREE controls the static buffer sizes in blake3.c (C library path) and platform.rs (Rust path). If it's smaller than the runtime simd_degree(), blake3_hash_many writes beyond the out buffer, causing a buffer overflow. If it's larger than needed, the only cost is extra stack space: 64 * 32 = 2KB for out buffers plus similar for ArrayVec capacities. I initially set it to 16, but 64 covers VLEN up to 2048 and the stack overhead is negligible.
| build_wasm32_simd(); | ||
| } | ||
|
|
||
| if is_rvv() && is_big_endian() { |
There was a problem hiding this comment.
is_rvv() should check for endianness too in the autodetection case. Otherwise it'll be impossible to compile for big-endian RISC-V targets, however uncommon they are.
| if is_riscv64() && is_rvv() { | ||
| println!("cargo:rustc-cfg=blake3_rvv"); | ||
| build_rvv_c_intrinsics(); | ||
| } |
There was a problem hiding this comment.
I think it should always build RVV support unless pure feature is selected. When RVV support is detected during compile time corresponding implementation will be used unconditionally, but if not, I think it still makes sense to do runtime CPU feature detection unless it is not possible.
There was a problem hiding this comment.
Unfortunately, although RISC-V does provide an x86 cpuid-like functionality, it's M-mode only (misa register). Or we can query the OS for supported ISAs, but the querying API varies by OS. On Linux, it's getauxval(AT_HWCAP), on FreeBSD, it's elf_aux_info. And I think we shouldn't assume that BLAKE3 won't run on bare metal, so runtime ISA dispatching should account for bare metal, Linux, and FreeBSD environments—it's too complicated.
See also: https://news.ycombinator.com/item?id=24002931
There was a problem hiding this comment.
I see. And target_feature in Rust doesn't seem to expose v even in Nightly. Very unfortunate.
That said, you could still conditionally check for features in OS-specific way when compiled for the OS.
61ddf51 to
0b6eb2e
Compare
0b6eb2e to
eecd748
Compare
| drop(f); | ||
|
|
||
| let mut build = cc::Build::new(); | ||
| build.flag("-march=rv64gcv"); |
There was a problem hiding this comment.
This is certainly testing something, but it is testing rv64gcv rather than the actual target the code will be compiled for. For example, rv64imv should theoretically work too, and maybe even rv64izve64x is sufficient.
Even though it is not possible to check for vector extension even in Nightly Rust right now, this will serve as a piece of documentation for things actually used in the code. Intrinsics above are already good, but actual extensions will be even better.
| if is_riscv64() && is_rvv() { | ||
| println!("cargo:rustc-cfg=blake3_rvv"); | ||
| build_rvv_c_intrinsics(); | ||
| } |
There was a problem hiding this comment.
I see. And target_feature in Rust doesn't seem to expose v even in Nightly. Very unfortunate.
That said, you could still conditionally check for features in OS-specific way when compiled for the OS.
Add support for RISC-V V extension backend.
The RVV-specific code is implemented with reference to the earlier ARM NEON version. It is mainly implemented in C since Rust RVV intrinsic hasn't been fully implemented yet. It may look weird when finding out all vectorized variables are defined separately instead of being defined as a 16-size array. That was because in RVV,
vuint32m1_ts are sizeless types while in ARM NEONuint32x4_ts are 16-byte values and arrays are not allowed to be constructed with sizeless types.RVV defines a variable-length vector register (VLEN is implementation-defined, not fixed by the ISA). This implementation adapts to the hardware's actual VLEN at runtime by querying vsetvlmax to determine the SIMD degree, rather than hardcoding a fixed lane count.
The current
MAX_SIMD_DEGREEis set to 16, which covers VLEN up to 512-bit. If future hardware supports VLEN > 512, users will need to patchMAX_SIMD_DEGREEinblake3_impl.h(C side) andplatform.rs(Rust side) accordingly and recompile.The following tests are conducted on SG2044 SoC(with 64 T-HEAD C920 cores). The RVV speedup decreases as thread count grows (2.36x single-threaded → 1.55x at 64 threads), as threading overhead and memory bandwidth contention increasingly dominate over per-core compute gains.