Add support for RVV 1.0 by MahnoKropotkinvich · Pull Request #542 · BLAKE3-team/BLAKE3

MahnoKropotkinvich · 2026-02-17T05:46:22Z

Add support for RISC-V V extension backend.

The RVV-specific code is implemented with reference to the earlier ARM NEON version. It is mainly implemented in C since Rust RVV intrinsic hasn't been fully implemented yet. It may look weird when finding out all vectorized variables are defined separately instead of being defined as a 16-size array. That was because in RVV, vuint32m1_ts are sizeless types while in ARM NEON uint32x4_ts are 16-byte values and arrays are not allowed to be constructed with sizeless types.

RVV defines a variable-length vector register (VLEN is implementation-defined, not fixed by the ISA). This implementation adapts to the hardware's actual VLEN at runtime by querying vsetvlmax to determine the SIMD degree, rather than hardcoding a fixed lane count.

The current MAX_SIMD_DEGREE is set to 16, which covers VLEN up to 512-bit. If future hardware supports VLEN > 512, users will need to patch MAX_SIMD_DEGREE in blake3_impl.h (C side) and platform.rs (Rust side) accordingly and recompile.

The following tests are conducted on SG2044 SoC(with 64 T-HEAD C920 cores). The RVV speedup decreases as thread count grows (2.36x single-threaded → 1.55x at 64 threads), as threading overhead and memory bandwidth contention increasingly dominate over per-core compute gains.

Threads	Size	Portable	Portable Thru	RVV	RVV Thru	Speedup
1	10M	65 ms	0.15 GB/s	30 ms	0.32 GB/s	2.16x
1	64M	385 ms	0.16 GB/s	166 ms	0.37 GB/s	2.31x
1	256M	1520 ms	0.16 GB/s	647 ms	0.38 GB/s	2.35x
1	1G	6070 ms	0.16 GB/s	2570 ms	0.38 GB/s	2.36x
4	256M	395 ms	0.63 GB/s	180 ms	1.38 GB/s	2.19x
4	1G	1560 ms	0.63 GB/s	701 ms	1.42 GB/s	2.23x
4	4G	6270 ms	0.63 GB/s	2780 ms	1.43 GB/s	2.24x
8	256M	210 ms	1.19 GB/s	102 ms	2.45 GB/s	2.05x
8	1G	819 ms	1.22 GB/s	389 ms	2.57 GB/s	2.10x
8	4G	3240 ms	1.23 GB/s	1520 ms	2.62 GB/s	2.12x
16	256M	117 ms	2.13 GB/s	63 ms	3.96 GB/s	1.85x
16	1G	443 ms	2.25 GB/s	228 ms	4.38 GB/s	1.94x
16	4G	1720 ms	2.31 GB/s	875 ms	4.57 GB/s	1.97x
32	256M	74 ms	3.37 GB/s	46 ms	5.43 GB/s	1.60x
32	1G	257 ms	3.89 GB/s	150 ms	6.66 GB/s	1.71x
32	4G	974 ms	4.10 GB/s	545 ms	7.33 GB/s	1.78x
64	256M	59 ms	4.23 GB/s	46 ms	5.43 GB/s	1.28x
64	1G	179 ms	5.58 GB/s	119 ms	8.40 GB/s	1.50x
64	4G	619 ms	6.46 GB/s	398 ms	10.05 GB/s	1.55x

nazar-pc

Also see #484 for some related discussion. There is a branch with assembly implementation Jack was working on there.

nazar-pc · 2026-02-28T20:22:17Z

build.rs

+    // Try rva23u64 first (RVA23 profile includes RVV 1.0), then fall back to rv64gcv.
+    // This matches the priority in CMakeLists.txt.
+    let march_flag = if build.is_flag_supported("-march=rva23u64").unwrap_or(false) {
+        "-march=rva23u64"
+    } else {
+        "-march=rv64gcv"
+    };


rva23u64 has vector extension, but it also has a lot more, just like rv64gcv. I don't think this is the right approach.

My use case could benefit from this PR while the embedded-ish target only implements Zve64x (with E extension too, so there are only 16 general purpose registers too).

Instead of changing the architecture, consider just enabling the minimum required extensions used here and nothing else or in use cases like mine things will blow up in runtime unexpectedly.

nazar-pc · 2026-02-28T20:25:25Z

build.rs

+fn is_riscv64() -> bool {
+    let arch = &target_components()[0];
+    arch == "riscv64gc" || arch == "riscv64a23"
+}
+
+fn is_rvv() -> bool {
+    // Explicit RVV feature flag
+    if defined("CARGO_FEATURE_RVV") {
+        return true;
+    }
+
+    // riscv64a23 target has built-in RVV support
+    let arch = &target_components()[0];
+    if arch == "riscv64a23" {
+        return true;
+    }
+
+    false
+}


This is very limited, I'd be nicer to detect the availability of extensions rather than parsing target triple for just a few known good values. I have a custom target that is called riscv64-unknown-none-abundance and it will support vector extension in the future, but this feature detection will be unable to take advantage of it automatically.

The fact that override exists helps, but it'd be much nicer if feature detection just worked out of the box.

MahnoKropotkinvich · 2026-03-03T07:04:34Z

@nazar-pc Thanks for reviewing.
So there's the concern that

rva23u64 is too broad which could include those instructions unimplemented on many platforms
the current ISA detection logic is too limited

I'm currently investigating how to implement a more flexible detection logic that enables RISC-V V extension on supported platforms without bringing in unimplemented instructions. Since neither CMake nor Cargo supports this natively, I may write some ad-hoc Rust/C code to handle it.

MahnoKropotkinvich · 2026-03-04T11:10:15Z

@nazar-pc I've updated the detection logic.
For native compilation, both CMake and Cargo will test if RVV code could be successfully compiled and run.
For cross compilation, since it's difficult to test target ISA features on host platforms, users should manually enable RVV support.

I also modified the is_rvv() logic, I think it would be fine now to assume it's a RISC-V 64 platform if the target string contains the riscv64 prefix.

nazar-pc · 2026-03-04T11:48:54Z

build.rs

+        if test_rvv_runtime_support() {
+            return true;
+        } else {
+            println!("cargo:warning=No RVV support detected, using portable implementation");


Not sure it is a good idea to print a warning. Many projects reject warnings and this one seems to be unavoidable on RISC-V, which will be a problem for external users.

nazar-pc · 2026-03-04T11:51:24Z

build.rs

+    let test_code = b"
+#include <riscv_vector.h>
+int main() {
+  size_t vl = __riscv_vsetvlmax_e32m1();


This is a single intrinsic. Does it cover all extensions used by this implementation or maybe more of them need to be added?

This time all functions used in the arch specific code are tested.

nazar-pc · 2026-03-04T11:52:28Z

build.rs

+    let cc = env::var("CC").unwrap_or_else(|_| "gcc".to_string());
+    let compile = Command::new(&cc)
+        .args(&["-march=rv64gcv", &test_c, "-o", &test_bin])
+        .output();


Why not compiling with https://docs.rs/cc instead?

cc does not compile C source to binary-it only produces static library.

but I still use cc to avoid manually setting up env vars.

nazar-pc · 2026-03-04T11:55:35Z

src/platform.rs

+        // We use 16 as the upper bound, when future hardwares with degree > 8 released
+        // alter this constant


Virtual targets often have 2048 bits width, including the one I'm working with. What does this constant impact in practice?

MAX_SIMD_DEGREE controls the static buffer sizes in blake3.c (C library path) and platform.rs (Rust path). If it's smaller than the runtime simd_degree(), blake3_hash_many writes beyond the out buffer, causing a buffer overflow. If it's larger than needed, the only cost is extra stack space: 64 * 32 = 2KB for out buffers plus similar for ArrayVec capacities. I initially set it to 16, but 64 covers VLEN up to 2048 and the stack overhead is negligible.

nazar-pc · 2026-03-04T11:57:39Z

build.rs

        build_wasm32_simd();
    }

+    if is_rvv() && is_big_endian() {


is_rvv() should check for endianness too in the autodetection case. Otherwise it'll be impossible to compile for big-endian RISC-V targets, however uncommon they are.

nazar-pc · 2026-03-04T12:01:59Z

build.rs

+    if is_riscv64() && is_rvv() {
+        println!("cargo:rustc-cfg=blake3_rvv");
+        build_rvv_c_intrinsics();
+    }


I think it should always build RVV support unless pure feature is selected. When RVV support is detected during compile time corresponding implementation will be used unconditionally, but if not, I think it still makes sense to do runtime CPU feature detection unless it is not possible.

Unfortunately, although RISC-V does provide an x86 cpuid-like functionality, it's M-mode only (misa register). Or we can query the OS for supported ISAs, but the querying API varies by OS. On Linux, it's getauxval(AT_HWCAP), on FreeBSD, it's elf_aux_info. And I think we shouldn't assume that BLAKE3 won't run on bare metal, so runtime ISA dispatching should account for bare metal, Linux, and FreeBSD environments—it's too complicated.
See also: https://news.ycombinator.com/item?id=24002931

I see. And target_feature in Rust doesn't seem to expose v even in Nightly. Very unfortunate.

That said, you could still conditionally check for features in OS-specific way when compiled for the OS.

nazar-pc · 2026-03-10T07:21:27Z

build.rs

+    drop(f);
+
+    let mut build = cc::Build::new();
+    build.flag("-march=rv64gcv");


This is certainly testing something, but it is testing rv64gcv rather than the actual target the code will be compiled for. For example, rv64imv should theoretically work too, and maybe even rv64izve64x is sufficient.

Even though it is not possible to check for vector extension even in Nightly Rust right now, this will serve as a piece of documentation for things actually used in the code. Intrinsics above are already good, but actual extensions will be even better.

nazar-pc · 2026-03-10T07:22:17Z

build.rs

+    if is_riscv64() && is_rvv() {
+        println!("cargo:rustc-cfg=blake3_rvv");
+        build_rvv_c_intrinsics();
+    }


I see. And target_feature in Rust doesn't seem to expose v even in Nightly. Very unfortunate.

That said, you could still conditionally check for features in OS-specific way when compiled for the OS.

MahnoKropotkinvich force-pushed the dev-rvv branch 2 times, most recently from 598dea9 to 53ab2ed Compare February 24, 2026 12:37

MahnoKropotkinvich changed the title ~~WIP: Add support for RVV 1.0~~ Add support for RVV 1.0 Feb 24, 2026

MahnoKropotkinvich force-pushed the dev-rvv branch 8 times, most recently from 54233d6 to f46a1cc Compare February 25, 2026 16:54

MahnoKropotkinvich marked this pull request as draft February 25, 2026 17:07

Add support for RVV 1.0

60dd32b

MahnoKropotkinvich force-pushed the dev-rvv branch from f46a1cc to 60dd32b Compare February 26, 2026 03:05

MahnoKropotkinvich marked this pull request as ready for review February 26, 2026 05:18

MahnoKropotkinvich mentioned this pull request Feb 27, 2026

GSoC (2026): Implement Custom Section Parsing and Branch Hinting proposal WasmEdge/WasmEdge#4517

Open

nazar-pc reviewed Feb 28, 2026

View reviewed changes

update build system

34b31d5

MahnoKropotkinvich force-pushed the dev-rvv branch from 839c344 to 34b31d5 Compare March 4, 2026 11:15

nazar-pc reviewed Mar 4, 2026

View reviewed changes

MahnoKropotkinvich force-pushed the dev-rvv branch 5 times, most recently from 61ddf51 to 0b6eb2e Compare March 10, 2026 05:27

fix alignment & update build system

eecd748

MahnoKropotkinvich force-pushed the dev-rvv branch from 0b6eb2e to eecd748 Compare March 10, 2026 06:12

nazar-pc reviewed Mar 10, 2026

View reviewed changes

		// We use 16 as the upper bound, when future hardwares with degree > 8 released
		// alter this constant

Conversation

MahnoKropotkinvich commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nazar-pc left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MahnoKropotkinvich commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MahnoKropotkinvich commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MahnoKropotkinvich Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MahnoKropotkinvich Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MahnoKropotkinvich Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MahnoKropotkinvich commented Feb 17, 2026 •

edited

Loading

MahnoKropotkinvich commented Mar 3, 2026 •

edited

Loading

MahnoKropotkinvich commented Mar 4, 2026 •

edited

Loading

MahnoKropotkinvich Mar 10, 2026 •

edited

Loading

MahnoKropotkinvich Mar 10, 2026 •

edited

Loading

MahnoKropotkinvich Mar 10, 2026 •

edited

Loading