Skip to content

Add support for RVV 1.0#542

Open
MahnoKropotkinvich wants to merge 3 commits intoBLAKE3-team:masterfrom
MahnoKropotkinvich:dev-rvv
Open

Add support for RVV 1.0#542
MahnoKropotkinvich wants to merge 3 commits intoBLAKE3-team:masterfrom
MahnoKropotkinvich:dev-rvv

Conversation

@MahnoKropotkinvich
Copy link

@MahnoKropotkinvich MahnoKropotkinvich commented Feb 17, 2026

Add support for RISC-V V extension backend.

The RVV-specific code is implemented with reference to the earlier ARM NEON version. It is mainly implemented in C since Rust RVV intrinsic hasn't been fully implemented yet. It may look weird when finding out all vectorized variables are defined separately instead of being defined as a 16-size array. That was because in RVV, vuint32m1_ts are sizeless types while in ARM NEON uint32x4_ts are 16-byte values and arrays are not allowed to be constructed with sizeless types.

RVV defines a variable-length vector register (VLEN is implementation-defined, not fixed by the ISA). This implementation adapts to the hardware's actual VLEN at runtime by querying vsetvlmax to determine the SIMD degree, rather than hardcoding a fixed lane count.

The current MAX_SIMD_DEGREE is set to 16, which covers VLEN up to 512-bit. If future hardware supports VLEN > 512, users will need to patch MAX_SIMD_DEGREE in blake3_impl.h (C side) and platform.rs (Rust side) accordingly and recompile.

The following tests are conducted on SG2044 SoC(with 64 T-HEAD C920 cores). The RVV speedup decreases as thread count grows (2.36x single-threaded → 1.55x at 64 threads), as threading overhead and memory bandwidth contention increasingly dominate over per-core compute gains.

Threads Size Portable Portable Thru RVV RVV Thru Speedup
1 10M 65 ms 0.15 GB/s 30 ms 0.32 GB/s 2.16x
1 64M 385 ms 0.16 GB/s 166 ms 0.37 GB/s 2.31x
1 256M 1520 ms 0.16 GB/s 647 ms 0.38 GB/s 2.35x
1 1G 6070 ms 0.16 GB/s 2570 ms 0.38 GB/s 2.36x
4 256M 395 ms 0.63 GB/s 180 ms 1.38 GB/s 2.19x
4 1G 1560 ms 0.63 GB/s 701 ms 1.42 GB/s 2.23x
4 4G 6270 ms 0.63 GB/s 2780 ms 1.43 GB/s 2.24x
8 256M 210 ms 1.19 GB/s 102 ms 2.45 GB/s 2.05x
8 1G 819 ms 1.22 GB/s 389 ms 2.57 GB/s 2.10x
8 4G 3240 ms 1.23 GB/s 1520 ms 2.62 GB/s 2.12x
16 256M 117 ms 2.13 GB/s 63 ms 3.96 GB/s 1.85x
16 1G 443 ms 2.25 GB/s 228 ms 4.38 GB/s 1.94x
16 4G 1720 ms 2.31 GB/s 875 ms 4.57 GB/s 1.97x
32 256M 74 ms 3.37 GB/s 46 ms 5.43 GB/s 1.60x
32 1G 257 ms 3.89 GB/s 150 ms 6.66 GB/s 1.71x
32 4G 974 ms 4.10 GB/s 545 ms 7.33 GB/s 1.78x
64 256M 59 ms 4.23 GB/s 46 ms 5.43 GB/s 1.28x
64 1G 179 ms 5.58 GB/s 119 ms 8.40 GB/s 1.50x
64 4G 619 ms 6.46 GB/s 398 ms 10.05 GB/s 1.55x
image

@MahnoKropotkinvich MahnoKropotkinvich force-pushed the dev-rvv branch 2 times, most recently from 598dea9 to 53ab2ed Compare February 24, 2026 12:37
@MahnoKropotkinvich MahnoKropotkinvich changed the title WIP: Add support for RVV 1.0 Add support for RVV 1.0 Feb 24, 2026
@MahnoKropotkinvich MahnoKropotkinvich force-pushed the dev-rvv branch 8 times, most recently from 54233d6 to f46a1cc Compare February 25, 2026 16:54
@MahnoKropotkinvich MahnoKropotkinvich marked this pull request as draft February 25, 2026 17:07
Copy link
Contributor

@nazar-pc nazar-pc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also see #484 for some related discussion. There is a branch with assembly implementation Jack was working on there.

build.rs Outdated
Comment on lines +330 to +336
// Try rva23u64 first (RVA23 profile includes RVV 1.0), then fall back to rv64gcv.
// This matches the priority in CMakeLists.txt.
let march_flag = if build.is_flag_supported("-march=rva23u64").unwrap_or(false) {
"-march=rva23u64"
} else {
"-march=rv64gcv"
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rva23u64 has vector extension, but it also has a lot more, just like rv64gcv. I don't think this is the right approach.

My use case could benefit from this PR while the embedded-ish target only implements Zve64x (with E extension too, so there are only 16 general purpose registers too).

Instead of changing the architecture, consider just enabling the minimum required extensions used here and nothing else or in use cases like mine things will blow up in runtime unexpectedly.

Comment on lines +102 to +120
fn is_riscv64() -> bool {
let arch = &target_components()[0];
arch == "riscv64gc" || arch == "riscv64a23"
}

fn is_rvv() -> bool {
// Explicit RVV feature flag
if defined("CARGO_FEATURE_RVV") {
return true;
}

// riscv64a23 target has built-in RVV support
let arch = &target_components()[0];
if arch == "riscv64a23" {
return true;
}

false
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very limited, I'd be nicer to detect the availability of extensions rather than parsing target triple for just a few known good values. I have a custom target that is called riscv64-unknown-none-abundance and it will support vector extension in the future, but this feature detection will be unable to take advantage of it automatically.

The fact that override exists helps, but it'd be much nicer if feature detection just worked out of the box.

@MahnoKropotkinvich
Copy link
Author

MahnoKropotkinvich commented Mar 3, 2026

@nazar-pc Thanks for reviewing.
So there's the concern that

  1. rva23u64 is too broad which could include those instructions unimplemented on many platforms
  2. the current ISA detection logic is too limited

I'm currently investigating how to implement a more flexible detection logic that enables RISC-V V extension on supported platforms without bringing in unimplemented instructions. Since neither CMake nor Cargo supports this natively, I may write some ad-hoc Rust/C code to handle it.

@MahnoKropotkinvich
Copy link
Author

MahnoKropotkinvich commented Mar 4, 2026

@nazar-pc I've updated the detection logic.
For native compilation, both CMake and Cargo will test if RVV code could be successfully compiled and run.
For cross compilation, since it's difficult to test target ISA features on host platforms, users should manually enable RVV support.

I also modified the is_rvv() logic, I think it would be fine now to assume it's a RISC-V 64 platform if the target string contains the riscv64 prefix.

build.rs Outdated
if test_rvv_runtime_support() {
return true;
} else {
println!("cargo:warning=No RVV support detected, using portable implementation");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it is a good idea to print a warning. Many projects reject warnings and this one seems to be unavoidable on RISC-V, which will be a problem for external users.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

let test_code = b"
#include <riscv_vector.h>
int main() {
size_t vl = __riscv_vsetvlmax_e32m1();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a single intrinsic. Does it cover all extensions used by this implementation or maybe more of them need to be added?

Copy link
Author

@MahnoKropotkinvich MahnoKropotkinvich Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This time all functions used in the arch specific code are tested.

build.rs Outdated
Comment on lines +142 to +145
let cc = env::var("CC").unwrap_or_else(|_| "gcc".to_string());
let compile = Command::new(&cc)
.args(&["-march=rv64gcv", &test_c, "-o", &test_bin])
.output();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not compiling with https://docs.rs/cc instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc does not compile C source to binary-it only produces static library.

Copy link
Author

@MahnoKropotkinvich MahnoKropotkinvich Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I still use cc to avoid manually setting up env vars.

src/platform.rs Outdated
Comment on lines +22 to +23
// We use 16 as the upper bound, when future hardwares with degree > 8 released
// alter this constant
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Virtual targets often have 2048 bits width, including the one I'm working with. What does this constant impact in practice?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAX_SIMD_DEGREE controls the static buffer sizes in blake3.c (C library path) and platform.rs (Rust path). If it's smaller than the runtime simd_degree(), blake3_hash_many writes beyond the out buffer, causing a buffer overflow. If it's larger than needed, the only cost is extra stack space: 64 * 32 = 2KB for out buffers plus similar for ArrayVec capacities. I initially set it to 16, but 64 covers VLEN up to 2048 and the stack overhead is negligible.

build_wasm32_simd();
}

if is_rvv() && is_big_endian() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_rvv() should check for endianness too in the autodetection case. Otherwise it'll be impossible to compile for big-endian RISC-V targets, however uncommon they are.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.

if is_riscv64() && is_rvv() {
println!("cargo:rustc-cfg=blake3_rvv");
build_rvv_c_intrinsics();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should always build RVV support unless pure feature is selected. When RVV support is detected during compile time corresponding implementation will be used unconditionally, but if not, I think it still makes sense to do runtime CPU feature detection unless it is not possible.

Copy link
Author

@MahnoKropotkinvich MahnoKropotkinvich Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, although RISC-V does provide an x86 cpuid-like functionality, it's M-mode only (misa register). Or we can query the OS for supported ISAs, but the querying API varies by OS. On Linux, it's getauxval(AT_HWCAP), on FreeBSD, it's elf_aux_info. And I think we shouldn't assume that BLAKE3 won't run on bare metal, so runtime ISA dispatching should account for bare metal, Linux, and FreeBSD environments—it's too complicated.
See also: https://news.ycombinator.com/item?id=24002931

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. And target_feature in Rust doesn't seem to expose v even in Nightly. Very unfortunate.

That said, you could still conditionally check for features in OS-specific way when compiled for the OS.

@MahnoKropotkinvich MahnoKropotkinvich force-pushed the dev-rvv branch 5 times, most recently from 61ddf51 to 0b6eb2e Compare March 10, 2026 05:27
drop(f);

let mut build = cc::Build::new();
build.flag("-march=rv64gcv");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is certainly testing something, but it is testing rv64gcv rather than the actual target the code will be compiled for. For example, rv64imv should theoretically work too, and maybe even rv64izve64x is sufficient.

Even though it is not possible to check for vector extension even in Nightly Rust right now, this will serve as a piece of documentation for things actually used in the code. Intrinsics above are already good, but actual extensions will be even better.

if is_riscv64() && is_rvv() {
println!("cargo:rustc-cfg=blake3_rvv");
build_rvv_c_intrinsics();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. And target_feature in Rust doesn't seem to expose v even in Nightly. Very unfortunate.

That said, you could still conditionally check for features in OS-specific way when compiled for the OS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants