-
Notifications
You must be signed in to change notification settings - Fork 287
RISC-V: riscv_hwprobe
-based feature detection on Linux / Android
#1770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d0b629a
to
3c93a6a
Compare
This looks good to me.
AArch64 code uses the way of putting things referenced by multiple OSes in https://github.com/rust-lang/stdarch/blob/master/crates/std_detect/src/detect/os/aarch64.rs stdarch/crates/std_detect/src/detect/mod.rs Lines 54 to 57 in 6780277
stdarch/crates/std_detect/src/detect/mod.rs Lines 60 to 63 in 6780277
Although not yet included in std_detect 1, FreeBSD and OpenBSD also support auxv-based detection (via elf_aux_info) on RISC-V, so we can use imply_features on those OSes as well. https://github.com/freebsd/freebsd-src/blob/main/sys/riscv/include/elf.h#L75 Footnotes
|
3c93a6a
to
a40c4e6
Compare
Thanks for your suggestion! I'll try and will likely push the new version tomorrow. |
0b3e930
to
168531e
Compare
riscv_hwprobe
-based feature detectionriscv_hwprobe
-based feature detection on Linux / Android
168531e
to
317b721
Compare
Adopted @taiki-e's suggestion and now, it's no longer a draft but a complete proposal. |
2b93ff9
to
c98bf97
Compare
1. Use canonical kernel.org repository instead of the GitHub mirror. 2. Refer to the fixed commit to guarantee access. 3. Use `uapi` part to ensure that the feature detection is primarily intended for user-mode programs.
This commit makes handling of the base ISA a separate block. Co-Authored-By: Taiki Endo <[email protected]>
Because this function will be no longer auxvec-only, this commit adds a comment to mark auxvec-based part. It *does not* add a comment to "base ISA" part because it may also use `riscv_hwprobe`-based results.
This commit prepares common infrastructure for extension implication by removing `enable_features` closure which makes each feature test longer (because it needs extra `value` argument each time we test a feature). It comes with the overhead to enable each feature separately but later mitigated by the OS-independent extension implication logic.
As Taiki Endo pointed out, there's a problem if we continue using `target_pointer_width` values to detect an architecture because: * There are separate `target_arch`s already and * There is an experimental ABI (not ratified though): RV64ILP32. cf. <https://lpc.events/event/17/contributions/1475/attachments/1186/2442/rv64ilp32_%20Run%20ILP32%20on%20RV64%20ISA.pdf> Co-Authored-By: Taiki Endo <[email protected]>
The "A" extension comprises instructions provided by the "Zaamo" and "Zalrsc" extensions. To prepare for the "Zacas" extension (which provides compare-and-swap instructions and discoverable from Linux) which depends on the "Zaamo" extension, it would be better to support those subsets.
The "B" extension is once abandoned (instead, it is ratified as a collection of "Zb*" extensions). However, it is later redefined and ratified as a superset of "Zba", "Zbb" and "Zbs" extensions (but not "Zbc" carry-less multiplication for limited benefits and implementation cost). Although non-functional (because feature detection is not yet implemented), it provides the foundation to implement this extension (along with straightforward documentation showing subsets of "B").
This is ported from Taiki Endo's branch and sorted by the `@FEATURE` order as in `src/detect/arch/riscv.rs`. Co-Authored-By: Taiki Endo <[email protected]>
c98bf97
to
11ca67f
Compare
PR version 9 is rebased against the latest commit (after #1769 is merged). |
54cc106
to
c2cdcc8
Compare
Author Re-Review Complete (Version 11)All of related extensions inside the ISA manual are reviewed and found that all implications inside
|
c2cdcc8
to
c46e8b7
Compare
Author Re-Re-Review Complete (Version 12)And concluded that I am still wondering how I could miss that section (defining its own I double checked
|
This commit adds the OS-independent extension implication logic for RISC-V. It implements: 1. Regular implication (A → B) a. "the extension A implies the extension B" b. "the extension A requires the extension B" c. "the extension A depends on the extension B" 2. Extension group or shorthand (A == B1 & B2...) a. "the extension A is shorthand for other extensions: B1, B2..." b. "the extension A comprises instructions provided by B1, B2..." This is implemented as (A → B1 & B2... + B1 & B2... → A) where the former is a regular implication as required by specifications and the latter is a "reverse" implication to improve usability. and prepares for: 3. Implication with multiple requirements (A1 & A2... → B) a. "A1 + A2 implies B" b. (implicitly used to implement reverse implication of case 2) Although it uses macros and iterators, good optimizers turn the series of implications into fast bit-manipulation operations. In the case 2 (extension group or shorthand; where a superset extension is just a collection of other subextensions and provides no features by a superset itself), specifications do specify that an extension group implies its members but not vice versa. However, implying an extension group from its members would improve usability on the feature detection (especially when the feature provider does not provide existence of such extension group but provides existence of its members). Similar "reverse implication" on RISC-V is implemented on LLVM. Case 3 is implicitly used to implement reverse implication of case 2 but there's another use case: implication with multiple requirements like "Zcf" and "Zcd" extensions (not yet implemented in this crate for now). To handle extension groups perfectly, we need to loop implication several times (until they converge; normally 2 times and up to 4 times when we add most of `riscv_hwprobe`-based features). To make implementation of that loop possible, `cache::Initializer` is modified to implement `PartialEq` and `Eq`.
This commit implements `riscv_hwprobe`-based feature detection as available on newer versions of the Linux kernel. It also queries whether the vector extensions are enabled using `prctl` but this is not supported on QEMU's userland emulator (as of version 9.2.3) and use the auxiliary vector as a fallback. Currently, all extensions discoverable from the Linux kernel version 6.14 and related extension groups (except "Supm", which reports the existence of `prctl`-based pointer masking control and too OS-dependent) are implemented. Co-Authored-By: Taiki Endo <[email protected]>
c46e8b7
to
f2b6303
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM other than a minor nit!
Until in-kernel feature detection is implemented, runtime detection of privileged extensions is temporally removed along with features themselves since none of such privileged features are stable. Co-Authored-By: Taiki Endo <[email protected]> Co-Authored-By: Amanieu d'Antras <[email protected]>
f2b6303
to
7087522
Compare
This PR implements full
riscv_hwprobe
-based feature detection on newer Linux kernel and implements OS-independent extension implication logic to make extension handling easier.This PR is a superset of #1769.
This PR is based on #1762 by @taiki-e. Commits with @taiki-e's code (with or without modification) are marked by
Co-Authored-By
.While the OS-independent logic in this PR uses iterators, I confirmed that LLVM is smart enough to optimize them into a series of bit-manipulation operations (multi-feature masking then multi-feature enablement).
RFC: Where to Put
imply_features()
?The only reason why I originally made this PR as a draft is, I was not sure where to put new OS-independent RISC-V extension implication logic (
imply_features()
).Responses
@taiki-e: suggested the path like
src/detect/os/riscv.rs
.Decision
@taiki-e's suggestion is adopted here (on PR version 7).
RFC: Automatic Loops?
Is it allowed to derive
Eq
forcache::Initializer
? Using this way, we can continue implication until the feature flags converge (will loop 2 times on normal cases, up to 4 times on mostly adversarial cases).If it's not allowed, I proved that (in the current state), looping
group!
definitions ofZvk*
3 times andgroup!
definitions ofZk*
2 times is sufficient to ensure convergence (I tentatively introduced manual loops in the PR v4).Decision
For now, I assume that it is okay to do that (on PR version 6).
Main Differences between #1762
Use key
RISCV_HWPROBE_KEY_IMA_EXT_0
after checking thatRISCV_HWPROBE_BASE_BEHAVIOR_IMA
is set on the keyRISCV_HWPROBE_KEY_BASE_BEHAVIOR
.Because
RISCV_HWPROBE_KEY_IMA_EXT_0
lists extensions compatible toRISCV_HWPROBE_BASE_BEHAVIOR_IMA
, it must be checked (current Linux requiresRV[32|64]IMA
+ some extra features but this difference makes the resulting code much robust when Linux kernel decided to lower its requirements),Use key
RISCV_HWPROBE_KEY_MISALIGNED_SCALAR_PERF
, not justRISCV_HWPROBE_KEY_CPUPERF_0
.@taiki-e's proposal only uses
RISCV_HWPROBE_KEY_CPUPERF_0
but this key is considered deprecated (because it is incorrectly implemented as a bitmask).Considering the fact that some versions of the Linux kernel do not support
RISCV_HWPROBE_KEY_MISALIGNED_SCALAR_PERF
,RISCV_HWPROBE_KEY_CPUPERF_0
-based unaligned scalar memory access performance checking is kept as a fallback.Attempt to enable vectors only once and use the most up to date information
This PR first uses
prctl
withPR_RISCV_V_GET_CONTROL
(whether vector is enabled on the current thread when a runtime feature detection is requested) and falls back to the auxiliary vector (whether a vector extension,V
, is enabled on the program starts) for workaround for userland emulation of QEMU (as of version 9.2.3).This PR attempts to use the latest (up to date) information for vector enablement and makes two differences when the program is ran on the real Linux kernel:
And this PR won't try to use the auxiliary vector for
V
enablement onceriscv_hwprobe
is confirmed that capable of checking vector extensions (to ensure that we only test the vector status on one timing (either program startup or on the first feature detection), not two (program startup and on the first feature detection)).Simplify feature enablement including uses of
enable_feature
through:enable_features
and thevalue
argument,test
which tests the value ofRISCV_HWPROBE_KEY_IMA_EXT_0
).Macros
imply!
andgroup!
inside new implication function makes writing extension implication pretty easy and making implication a separate function (imply_features
) will make computing various flags on the feature detection logic unnecessary. It'll make the logic hard to break when multiple feature detection logic is used (see code snippet 1).Complex implication logic also works as expected as in code snippet 2.
Imply
I
,M
,A
,Zicsr
andZifencei
when we find theIMA
base usingriscv_hwprobe
.Due to historical reasons, the
I
extension does not preserve backward compatibility but rather in reverse. TheI
extension in the ISA manual version 2.2 (RV32I/RV64I version 2.0) is splitted as three extensionsI
(RV32I/RV64I version 2.1),Zicsr
,Zifencei
and one non-extension "Counters" (with a few amendants) in the ISA manual version 20190608 and "Counters" are ratified as two extensionsZicntr
(originally a part of theI
extension) andZihpm
(amended parts in the non-extension "Counters") as in the ISA manual version 20240411.The thing is, Linux's base behavior defines that the
I
extension of theIMA
base is of the ISA manual version 2.2 (later splitted to various extensions). Ifriscv_hwprobe
succeeds and has theIMA
base, the author chose to enable not justI
but alsoZicsr
andZifencei
.Zifencei
was initially excluded because the Linux documentation states thatfence.i
is not expected to be run on the user mode. Although no traps will be generated, its effect is unreliable unless SMP is disabled.However, this is only the normal path. If Concurrent Modification and Execution of Instructions (CMODX) is enabled,
fence.i
can be valid on the Linux userland, making this implication useful on certain cases. So, it is now implied on the PR version 9.Zihpm
is excluded for not being a part of the originalI
extension as in the ISA manual version 2.2).Zicntr
is excluded (on the PR version 2 or later) while it seems safe to imply that but Linux 6.15 will include (as of rc1) the separate constantRISCV_HWPROBE_EXT_ZICNTR
to detect the existence of theZicntr
extension. So, the author chose more pessimistic assumption.Zicsr
should be safe (so implied) because Linux depends on the privileged architecture, which depends on theZicsr
extension. If no other extensions with CSRs are enabled, it is almost equivalent to not having theZicsr
extension.Same as #1762
Code Snippets
1. From #1762: Multiple manual feature enablement
It enables
V
and its subsets. If this is alone, that would be okay but:Despite that commented out in #1762 (which is okay), handling separate vector subsets will need additional logic. Just removing comments here makes the code partially incorrect (depending on how
has_v
is computed) for beingZicsr
not implied byZve32x
.This PR removes the needs to write extension implication logic for multiple times.
2. From this PR: Complex implication
It accurately represents what features to enable depending on the situation.
History
Version 1 (2025-04-11)
The first proposal.
Version 2 (2025-04-11)
See diff
IMA
base.Excluded the
Zicntr
extension from implication.has_v
inside fine-grained detection logic usingriscv_hwprobe
tohas_vectors
is_vectors_enabled
is more accurate name but it should be sufficient (denoting whether the vector extension or its subset(s) are enabled).It sets
V
purely depending on the auxiliary vector only if no fine-grained vector extension detection is available (PR v1 stated that this occurs whenriscv_hwprobe
is unavailable but that was incorrect).Version 3 (2025-04-11)
See diff
There's still a case where implying without a loop won't converge. We need to loop a portion several times.
Version 4 (2025-04-11)
See diff
But this is a tentative change. If I'm allowed to derive
Eq
forcache::Initializer
, OS-independent RISC-V extension implication logic will get a lot simpler (loop over until it converges; termination is guaranteed because we are never removing features).Version 5 (2025-04-12)
See diff
Supm
extension removed (to be too OS-dependent; thanks @Amanieu!).Version 6 (2025-04-12)
See diff
I
as in the ISA manual version 2.2 is not theI
extension with version 2.2 (actually, the ISA manual version 2.2 defines version 2.0 of RV32I and RV64I and splittedI
base indicates version 2.1 of RV32I and RV64I).Normally, it loops twice. Termination of the loop (in finite time) can be easily proved (from two facts: (1) we have finite number of feature flags and (2) we never unset any feature flags) and in fact, even on the worst case (including the case where the feature flags are completely broken), the maximum loop count is currently 4 (formally proven).
To implement this, the author added automatic derive of
PartialEq
andEq
tocache::Initializer
.Version 7 (2025-04-12)
See diff
unaligned-scalar-mem
andunaligned-vector-mem
).imply_features
to a separate module as suggested by @taiki-e.Now, it's an official proposal (leaving from the draft status).
Version 8 (2025-04-12)
See diff
unaligned-scalar-mem
andunaligned-vector-mem
(make them
"unaligned-scalar-mem"
and"unaligned-vector-mem"
for consistency).Version 9 (2025-04-12)
See diff
Zifencei
extension from the LinuxIMA
base.The author initially excluded this because
fence.i
of theZifencei
instruction is normally invalid on Linux ABI but found this is only normally true. If Concurrent Modification and Execution of Instructions (CMODX) is enabled,fence.i
can be valid on the Linux userland. Even if CMODX is not enabled, it will not cause any traps so "use at your risk" policy should work (fence.i
is generally unreliable on SMP-enabled systems with preemptive multi-threading, not just Linux).Version 10 (2025-04-12)
See diff
Zvfh
→Zfhmin
Zawrs
extension)Zvknhb
→Zvknha
Zbc
→Zbkc
Zvbb
→Zvkb
(from PR v1; comment removed in PR v11 as there's a strong evidence now)Zvknhb
→Zvknha
(from PR v10)Zbc
→Zbkc
(from PR v10)Zvfh
→Zvfhmin
(from PR v1)Zkr
→Zicsr
(from PR v1)Version 11 (2025-04-13)
See diff
This is functionally equivalent to the version 10 but incorporates minor changes.
Zvbb
→Zvkb
(remove "defined as subset" comment which have denoted that there was a weak evidence only).Zvknhb
→Zvknha
implication after group definitions insideimply_features
.It makes the worst iteration count minimum.
Although the big loop inside this function is designed to be ordering-free (not introducing a bug when we move
imply!
andgroup!
macro uses and it's free to add implications in any order),at least my contribution is designed also to minimize iteration count.
Version 12 (2025-04-13)
See diff
Oh no... How can I miss that section for years!?
Zkr
→Zicsr
(considered not an errata)Although the
seed
CSR must be accessed through CSR instructions, originally defined in theZicsr
extension, scalar cryptography spec defines its ownseed
CSR access instruction (a subset ofZicsr
).I (somehow) did not catch this for years.
Version 13 (2025-04-15)
See diff
Despite that this is functionally equivalent to the version 12, it is now tested.
Version 14 (2025-04-16)
See diff