The RISC-V vector C intrinsics provides users interfaces in the C language level to directly leverage the RISC-V "V" extension 0 (also abbreviated as "RVV"), with help from the compiler to handle instruction scheduling and register allocation.
This document will use the term "v-spec" to indicate the RISC-V "V" extension specification 0.
The __riscv_v_intrinsic
macro is the test macro to check the availability and version the compiler supports for the RISC-V vector C intrinsic API.
The value of the architecture extension test macro is defined as its version, which is computed by the following formula. The formula is identical to what is defined under the RISC-V C API specification 1 .
For example, the v1.0 version should define the macro with value 1000000
.
<MAJOR_VERSION> * 1,000,000 + <MINOR_VERSION> * 1,000 + <REVISION_VERSION>
To leverage the intrinsics in the compiler, the header <riscv_vector.h>
needs to be included. We suggest the inclusion to be guarded with the test macro of the intrinsics.
#ifdef __riscv_v_intrinsic
#include <riscv_vector.h>
#endif /* __riscv_v_intrinsic */
With <riscv_vector.h>
included, availability of each intrinsic depends on the required architecture of their corresponding vector instruction. The supported architecture is passed into the compiler using the -march
option 2,3.
V-spec 0 section 18.2 describes the Zve*
extensions and how the instructions are supported.
The intrinsics allow users to control fields in vtype
, and also rounding modes for fixed-point (vxrm
) and floating-point intrinsics (frm
).
In this section, we cover how the intrinsics embed the control of vtype
fields in the function names. Please see Naming scheme for the rules described in this section.
The RISC-V vector intrinsics data types are strongly-typed, and hence so are the intrinsics taking these types as parameters. The vector intrinsics encode the EEW (effective-element-width) 17 and EMUL (effective LMUL) 17 in the suffix of the function name. Users can expect the results of the vector instruction intrinsics are computed under the specified EEW and EMUL.
To see the full list of data types for the intrinsics, please see Type system.
In the following example, the intrinsic will produce the result of vadd.vv
with vector operands of EEW=32
and EMUL=1
.
vint32m1_t __riscv_vadd_vv_i32m1 (vint32m1_t op1, vint32m1_t op2, size_t vl);
In the following example, the intrinsic will produce the result of vwadd.vv
with destinations of EEW=32
and EMUL=1
, and operands of EEW=16
and EMUL=mf2 (1/2)
.
vint32m1_t __riscv_vwadd_vv_i32m1 (vint16mf2_t op1, vint16mf2_t op2, size_t vl);
Direct control of vl
(vector length register) 11 is not exposed at the C language level. In the intrinsics that represents vector instructions, the vl
operand indicates the active vector length valid during intrinsic execution.
Note
|
Intrinsics for instructions that behave the same with different vl settings (e.g. vmv.s.x ) do not have a vl operand.
|
Note
|
Intrinsics will at most operate vlmax (derived in 3.4.2 of v-spec) elements.
|
Instructions that are available for masking 7 are also supported in the intrinsics.
The intrinsics fuse the control of vector masking (vm
) together with the control for the policy behavior in the same suffix. Please checkout Control of vta
, vma
and Policy naming scheme for the exact suffix that specifies a masked/unmasked vector operation along with its policy behavior.
The behavior of destination tail elements and destination inactive masked-off elements is controled by the vta
and vma
bits 6.
Given the general assumption that target audience of the RVV intrinsics are high performance cores, and an "undisturbed" policy will generally slow down an out-of-order core, the intrinsics have a default policy scheme of tail-agnostic and mask-agnostic (that is, vta=1
and vma=1
).
Please see Policy naming scheme for the exact suffix that specifies the policy behavior of a vector operation.
For the fixed-point intrinsics, representing the instructions under v-spec section 12, the vxrm
operand of the intrinsics indicates the rounding mode (vxrm
) 8 control.
The vxrm
operand is required to be an integer constant expression. The compiler should implement the following enum that maps to the defined rounding mode values under v-spec Table 4 8.
enum __RISCV_VXRM {
__RISCV_VXRM_RNU = 0,
__RISCV_VXRM_RNE = 1,
__RISCV_VXRM_RDN = 2,
__RISCV_VXRM_ROD = 3,
};
Note
|
Computations of vsadd , vsaddu , vssub , and vssubu do not need the rounding mode, therefore the intrinsics of these instructions do not have the vxrm parameter.
|
Note
|
The RISC-V psABI 9 states that vxrm is not preserved across calls. Optimization for reducing the number of redundant writes to vxrm is a compiler and system specific issue.
|
For the floating-point intrinsics, representing the instructions under v-spec section 13, the intrinsics have two classes, called implicit-frm
and explicit-frm
.
The implicit-frm
intrinsics behave like any C-language floating-point expressions, using the default rounding mode when FENV_ACCESS
is off, and using the fenv
dynamic rounding mode when FENV_ACCESS
is on.
Note
|
Both GNU and LLVM compilers generate scalar floating-point instructions using dynamic rounding mode, relying on the kernel initialization to set frm to RNE (specified as "roundTiesToEven" in IEEE-754 (a.k.a. IEC 60559)).
|
Note
|
The implicit-frm intrinsics are intended to be used regardless of FENV_ACCESS . They are provided when FENV_ACCESS is on for the (few) programmers who are already using fenv. And they are provided when FENV_ACCESS is off for the (vast majority of) programmers who prefer the default rounding mode.
|
The explicit-frm
intrinsics contain the frm
operand which indicates the rounding mode (frm
) 10 control. The floating-point intrinsics with the frm
operand are followed by an _rm
suffix in the function name.
The frm
operand is required to be an integer constant expression. The compiler should implement the following enum that maps to the defined rounding mode values under RISC-V ISA Manual Table 8.1 8.
enum __RISCV_FRM {
__RISCV_FRM_RNE = 0,
__RISCV_FRM_RTZ = 1,
__RISCV_FRM_RDN = 2,
__RISCV_FRM_RUP = 3,
__RISCV_FRM_RMM = 4,
};
Note
|
The explicit-frm intrinsics are intended to be used when FENV_ACCESS is off, to enable more aggressive optimization while still providing the programmer with control over the rounding mode. Using explicit-frm intrinsics when FENV_ACCESS` is on will still work correctly, but is expected to lead to extra saving/restoring of frm , that could be avoided by using fenv functionality and implicit-frm .
|
The naming scheme of the intrinsics expresses the users' control of fields in vtype
, and also rounding modes for the fixed-point and the floating-point intrinsics. For details of the vtype
control, please see [control-to-vector-programming-mode].
The RVV intrinsics can be split into two major types, called "explicit (non-overloaded) intrinsics" and "implicit (overloaded) intrinsics".
The explicit (non-overloaded) intrinsics embeds the control described under Control of the vector extension programming model in the function name. This scheme makes intrinsics program easier to read given that the execution states be explicitly specified in the code.
The implicit (overloaded) intrinsics, on the contrary, hide the explicit specifications for vtype
control. The implicit (overloaded) intrinsics aim to provide a generic interface to let users put values of different EEW and EMUL as the input operand.
This section covers the general naming rule of the two types of intrinsics accordingly. Then, this section also enumerates the exceptions and the rationale behind them in [chapter-exception-naming].
With the default policy scheme mentioned under Control of vta
, vma
, each intrinsic provides corresponding variants for their available control of the policy behavior. The following list enumerates the control of the policy behavior and their corresponding suffix.
-
No suffix: Represents an unmasked (
vm=1
) vector operation with tail-agnostic (vta=1
) -
_tu
suffix: Represents an unmasked (vm=1
) vector operation with tail-undisturbed (vta=0
) -
_m
suffix: Represents a masked (vm=0
) vector operation with tail-agnostic (vta=1
), mask-agnostic (vma=1
) -
_tum
suffix: Represents a masked (vm=0
) vector operation with tail-undisturbed (vta=0
), mask-agnostic (vma=1
) -
_mu
suffix: Represents a masked (vm=0
) vector operation with tail-agnostic (vta=1
), mask-undisturbed (vma=0
) -
_tumu
suffix: Represents a masked (vm=0
) vector operation with tail-undisturbed (vta=0
), mask-undisturbed (vma=0
)
Using vadd
with EEW=32 and EMUL=1 as an example, the variants are:
// vm=1, vta=1
vint32m1_t __riscv_vadd_vv_i32m1(vint32m1_t op1, vint32m1_t op2, size_t vl);
// vm=1, vta=0
vint32m1_t __riscv_vadd_vv_i32m1_tu(vint32m1_t maskedoff, vint32m1_t op1,
vint32m1_t op2, size_t vl);
// vm=0, vta=1, vma=1
vint32m1_t __riscv_vadd_vv_i32m1_m(vbool32_t mask, vint32m1_t op1,
vint32m1_t op2, size_t vl);
// vm=0, vta=0, vma=1
vint32m1_t __riscv_vadd_vv_i32m1_tum(vbool32_t mask, vint32m1_t maskedoff,
vint32m1_t op1, vint32m1_t op2, size_t vl);
// vm=0, vta=1, vma=0
vint32m1_t __riscv_vadd_vv_i32m1_mu(vbool32_t mask, vint32m1_t maskedoff,
vint32m1_t op1, vint32m1_t op2, size_t vl);
// vm=0, vta=0, vma=0
vint32m1_t __riscv_vadd_vv_i32m1_tumu(vbool32_t mask, vint32m1_t maskedoff,
vint32m1_t op1, vint32m1_t op2,
size_t vl);
Note
|
When policy is set to "agnostic", there is no guarantee of what will be in the tail/masked-off elements. Under this policy users should not assume the values within to be deterministic. With no passthrough operand, the compiler can pick any register as the destination register. |
Note
|
Pseudo intrinsics mentioned under Pseudo intrinsics do not map to real vector intsructions. Therefore these intrinsics are not affected by the policy setting, nor do they have intrinsic variants of the policy suffix listed above. |
Intrinsics whose computation is relevant to value held in vd
have a passthrough operand in them. The following list enumerates the intrinsics that has a passthrough operand. Please see (Appendix: list of prototypes of intrinsics) for the exact prototypes.
-
Intrinsics with tail-undisturbed (
vta=0
) -
Intrinsics with mask-undisturbed (
vma=0
) -
Intrinsics representing Vector Multiply-Add Operations 13
In general, the intrinsics are encoded as the following. The intrinsics under this naming scheme are the "non-overloaded intrinsics", which in parallel we have the "overloaded intrinsics" defined under Implicit (Overloaded) naming scheme.
__riscv_{V_INSTRUCTION_MNEMONIC}_{OPERAND_MNEMONIC}_{RETURN_TYPE}_{ROUND_MODE}_{POLICY}{(...)
-
OPERAND_MNEMONIC
are likevv
,vx
,vs
,vvm
,vxm
-
Depending on whether the
RETURN_TYPE
is a mask register…-
For intrinsics that have a non-mask register as the destination register
-
EEW
is one ofi8 | i16 | i32 | i64 | u8 | u16 | u32 | u64 | f16 | f32 | f64
. -
EMUL
is one ofm1 | m2 | m4 | m8 | mf2 | mf4 | mf8
-
-
For intrinsics that have a mask register as a destination register
-
The return type is one of
b1 | b2 | b4 | b8 | b16 | b32 | b64
, which is derived from the ratioEEW
/EMUL
-
-
-
V_INSTRUCTION_MNEMONIC
are likevadd
,vfmacc
,vsadd
.-
Type system explains the limited enumeration of EEW, LEUL pair.
-
-
ROUND_MODE
is the_rm
suffix mentioned in Explicit-frm
intrinsics. Other intrinsics do not have this suffix -
POLICY
are enumerated under [chapter-control-to-policy]
The general naming scheme is not sufficient to express intrinsics. The exceptions are enumerated under Exceptions in the explicit (non-overloaded) naming scheme.
This section enumerates the exceptions in the naming scheme.
Only encoding the return type will cause naming collision for the permutation instruction intrinsics. The intrinsics encode the input vector type and the the output scalar type in the suffix.
int8_t vmv_x_s_i8m1_i8 (vint8m1_t vs2, size_t vl);
int8_t vmv_x_s_i8m2_i8 (vint8m2_t vs2, size_t vl);
int8_t vmv_x_s_i8m4_i8 (vint8m4_t vs2, size_t vl);
int8_t vmv_x_s_i8m8_i8 (vint8m8_t vs2, size_t vl);
Only encoding the return type will cause naming collision for the reduction instruction intrinsics. The intrinsics encode the input vector type and the output vector type in the suffix.
vint8m1_t vredsum_vs_i8m1_i8m1(vint8m1_t dest, vint8m1_t vs2, vint8m1_t vs1,
size_t vl);
vint8m1_t vredsum_vs_i8m2_i8m1(vint8m1_t dest, vint8m2_t vs2, vint8m1_t vs1,
size_t vl);
vint8m1_t vredsum_vs_i8m4_i8m1(vint8m1_t dest, vint8m4_t vs2, vint8m1_t vs1,
size_t vl);
vint8m1_t vredsum_vs_i8m8_i8m1(vint8m1_t dest, vint8m8_t vs2, vint8m1_t vs1,
size_t vl);
Only encoding the return type will cause naming collision for these pseudo instructions. The intrinsics encode the input vector type and the output vector type in the suffix.
The following shows an example with _riscv_vreinterpret_v_i32m1*
vfloat32m1_t __riscv_vreinterpret_v_i32m1_f32m1 (vint32m1_t src);
vuint32m1_t __riscv_vreinterpret_v_i32m1_u32m1 (vint32m1_t src);
vint8m1_t __riscv_vreinterpret_v_i32m1_i8m1 (vint32m1_t src);
vint16m1_t __riscv_vreinterpret_v_i32m1_i16m1 (vint32m1_t src);
vint64m1_t __riscv_vreinterpret_v_i32m1_i64m1 (vint32m1_t src);
vbool64_t __riscv_vreinterpret_v_i32m1_b64 (vint32m1_t src);
vbool32_t __riscv_vreinterpret_v_i32m1_b32 (vint32m1_t src);
vbool16_t __riscv_vreinterpret_v_i32m1_b16 (vint32m1_t src);
vbool8_t __riscv_vreinterpret_v_i32m1_b8 (vint32m1_t src);
vbool4_t __riscv_vreinterpret_v_i32m1_b4 (vint32m1_t src);
The overloaded interface aims to let users put values of different EEW and EMUL as the input operand, therefore hiding the EEW and EMUL encoded in the function name. The _rm
prefix for explicit-frm
intrinsics (Control of frm
) is also hidden. The intrinsics under this scheme are the "overloaded intrinsics", which in parallel we have "non-overloaded intrinsics" defined under Explicit (Non-overloaded) naming scheme.
Take the vector add (vadd
) as an example, stripping off the operand mnemonics and encoded EEW, EMUL information, the intrinsics API provides the following overloaded interfaces.
vint32m1_t __riscv_vadd(vint32m1_t v0, vint32m1_t v1, size_t vl);
vint16m4_t __riscv_vadd(vint16m4_t v0, vint16m4_t v1, size_t vl);
Since the main intent is to let the users put different values of EEW and EMUL as input operand, the overloaded intrinsics do not hide the policy suffix. That is, suffix listed under [chapter-control-to-policy] is not hidden and is still encoded in the function name, except for the masked, tail-agnostic, mask-agnostic (vm=0
, vta=1
, vma=1
) variant. Take the vector floating-point add (vfadd
) as an example, the intrinsics API provides the following overloaded interfaces.
vfloat32m1_t __riscv_vfadd(vbool32_t mask, vfloat32m1_t op1, vfloat32m1_t op2,
unsigned int frm, size_t vl);
vfloat16m4_t __riscv_vfadd(vbool4_t mask, vfloat16m4_t op1, vfloat16m4_t op2,
unsigned int frm, size_t vl);
vfloat32m1_t __riscv_vfadd_tu(vfloat32m1_t maskedoff, vfloat32m1_t op1,
vfloat32m1_t op2, size_t vl);
vfloat32m1_t __riscv_vfadd_tum(vbool32_t mask, vfloat32m1_t maskedoff,
vfloat32m1_t op1, vfloat32m1_t op2, size_t vl);
vfloat32m1_t __riscv_vfadd_tumu(vbool32_t mask, vfloat32m1_t maskedoff,
vfloat32m1_t op1, vfloat32m1_t op2, size_t vl);
vfloat32m1_t __riscv_vfadd_mu(vbool32_t mask, vfloat32m1_t maskedoff,
vfloat32m1_t op1, vfloat32m1_t op2, size_t vl);
vfloat32m1_t __riscv_vfadd_tu(vfloat32m1_t maskedoff, vfloat32m1_t op1,
vfloat32m1_t op2, unsigned int frm, size_t vl);
vfloat32m1_t __riscv_vfadd_tum(vbool32_t mask, vfloat32m1_t maskedoff,
vfloat32m1_t op1, vfloat32m1_t op2,
unsigned int frm, size_t vl);
vfloat32m1_t __riscv_vfadd_tumu(vbool32_t mask, vfloat32m1_t maskedoff,
vfloat32m1_t op1, vfloat32m1_t op2,
unsigned int frm, size_t vl);
vfloat32m1_t __riscv_vfadd_mu(vbool32_t mask, vfloat32m1_t maskedoff,
vfloat32m1_t op1, vfloat32m1_t op2,
unsigned int frm, size_t vl);
The naming scheme does not cover all of intrinsics. Please see Exceptions in the implicit (overloaded) naming sheme for overloaded intrinsics with irregular naming patterns.
Due to the limitation of the C language (without the aid of features like C++ templates), some intrinsics do not have an overloaded version. Therefore these intrinsics do not possess a simplified, EEW/EMUL-hidden interface. Please see Un-supported intrinsics for implicit (overloaded) naming scheme for more detail.
The following intrinsics have an irregular naming pattern.
Widening intruction intrinsics (e.g. vwadd
) have the same return type but different parameters. The operand mnemonics are encoded into their overloaded versions to help distinguish them.
vint32m1_t __riscv_vwadd_vv (vint16mf2_t op1, vint16mf2_t op2, size_t vl);
vint32m1_t __riscv_vwadd_vx (vint16mf2_t op1, int16_t op2, size_t vl);
vint32m1_t __riscv_vwadd_wv (vint32m1_t op1, vint16mf2_t op2, size_t vl);
vint32m1_t __riscv_vwadd_wx (vint32m1_t op1, int16_t op2, size_t vl);
Type-convert instruction intrinsics (e.g. vfcvt.f.x
) encode the returning value mnemonics (e.g. vfcvt_f
) into their overloaded variant to help distinguish them.
vfloat32m1_t __riscv_vfcvt_f_tu(vfloat32m1_t maskedoff, vint32m1_t src,
size_t vl);
vfloat32m1_t __riscv_vfcvt_f_tum(vbool32_t mask, vfloat32m1_t maskedoff,
vint32m1_t src, size_t vl);
vfloat32m1_t __riscv_vfcvt_f_tumu(vbool32_t mask, vfloat32m1_t maskedoff,
vint32m1_t src, size_t vl);
vfloat32m1_t __riscv_vfcvt_f_mu(vbool32_t mask, vfloat32m1_t maskedoff,
vint32m1_t src, size_t vl);
vfloat32m1_t __riscv_vfcvt_f_tu(vfloat32m1_t maskedoff, vint32m1_t src,
unsigned int frm, size_t vl);
vfloat32m1_t __riscv_vfcvt_f_tum(vbool32_t mask, vfloat32m1_t maskedoff,
vint32m1_t src, unsigned int frm, size_t vl);
vfloat32m1_t __riscv_vfcvt_f_tumu(vbool32_t mask, vfloat32m1_t maskedoff,
vint32m1_t src, unsigned int frm, size_t vl);
vfloat32m1_t __riscv_vfcvt_f_mu(vbool32_t mask, vfloat32m1_t maskedoff,
vint32m1_t src, unsigned int frm, size_t vl);
These pseudo intrinsics (e.g. vreinterpret
) encode the return type (e.g. __riscv_vreinterpret_b8
) into their overloaded variant to help distinguish them.
vbool8_t __riscv_vreinterpret_b8 (vint8m1_t src);
vbool8_t __riscv_vreinterpret_b8 (vuint8m1_t src);
vbool8_t __riscv_vreinterpret_b8 (vint16m1_t src);
vbool8_t __riscv_vreinterpret_b8 (vuint16m1_t src);
vbool8_t __riscv_vreinterpret_b8 (vint32m1_t src);
vbool8_t __riscv_vreinterpret_b8 (vuint32m1_t src);
vbool8_t __riscv_vreinterpret_b8 (vint64m1_t src);
vbool8_t __riscv_vreinterpret_b8 (vuint64m1_t src);
Due to the limitation of the C language (without the aid of features like C++ templates), some intrinsics do not have an overloaded version. Intrinsics with characteristics of either of the following do not possess an overloaded version.
-
Intrinsics with input arguments are all scalar types and scalar types alone (e.g. Vector load instruction intrinsics, Vector move instruction intrinsics)
-
Intrinsics without any input argument (e.g.
vmclr
,vmset
,vid
) -
Intrinsics with vector boolean input(s), returning a vector non-boolean vector type (e.g.
viota
)
The RVV intrinsics are designed to be strongly-typed. The intrinsics provide vreinterpret
intrinsics to help users go across the strongly-typed scheme if necessary.
Non-mask (integer and floating-point) data types have SEW and LMUL encoded.
The integer types have EEW and EMUL encoded into the type. The first row describes the EMUL and the first column describes the data type and element width of the scalar type.
Type with bold font is only available when ELEN >= 64
(that is, unavailable under Zve32*
).
Types |
EMUL=1/8 |
EMUL=1/4 |
EMUL=1/ 2 |
EMUL=1 |
EMUL=2 |
EMUL=4 |
EMUL=8 |
int8_t |
vint8mf8_t |
vint8mf4_t |
vint8mf2_t |
vint8m1_t |
vint8m2_t |
vint8m4_t |
vint8m8_t |
int16_t |
N/A |
vint16mf4_t |
vint16mf2_t |
vint16m1_t |
vint16m2_t |
vint16m4_t |
vint16m16_t |
int32_t |
N/A |
N/A |
vint32mf2_t |
vint32m1_t |
vint32m2_t |
vint32m4_t |
vint32m32_t |
int64_t |
N/A |
N/A |
N/A |
vint64m1_t |
vint64m2_t |
vint64m4_t |
vint64m8_t |
uint8_t |
vuint8mf8_t |
vuint8mf4_t |
vuint8mf2_t |
vuint8m1_t |
vuint8m2_t |
vuint8m4_t |
vuint8m8_t |
uint16_t |
N/A |
vuint16mf4_t |
vuint16mf2_t |
vuint16m1_t |
vuint16m2_t |
vuint16m4_t |
vuint16m8_t |
uint32_t |
N/A |
N/A |
vuint32mf2_t |
vuint32m1_t |
vuint32m2_t |
vuint32m4_t |
vuint32m8_t |
uint64_t |
N/A |
N/A |
N/A |
vuint64m1_t |
vuint64m2_t |
vuint64m4_t |
vuint64m8_t |
The floating-point types have EEW and EMUL encoded into the type. The first row describes the EMUL and the first column describes the data type and element width of the scalar type.
Floating-point types with element widths of 16 requires the zvfh
and zvfhmin
extension to be specified in the architecture.
Floating-point types with element widths of 32 requires the zve32f
extension to be specified in the architecture.
Floating-point types with element widths of 64 requires the zve64d
extension to be specified in the architecture.
Types |
EMUL=1/8 |
EMUL=1/4 |
EMUL=1/ 2 |
EMUL=1 |
EMUL=2 |
EMUL=4 |
EMUL=8 |
float16_t |
N/A |
vfloat16m4_t |
vfloat16mf2_t |
vfloat16m1_t |
vfloat16m2_t |
vfloat16m4_t |
vfloat16m8_t |
float32_t |
N/A |
N/A |
vfloat32mf2_t |
vfloat32m1_t |
vfloat32m2_t |
vfloat32m4_t |
vfloat32m8_t |
float64_t |
N/A |
N/A |
N/A |
vfloat64m1_t |
vfloat64m2_t |
vfloat64m4_t |
vfloat64m8_t |
The mask types encode the ratio that is derived from EEW
/EMUL
. The mask types represent mask register values that follows the Mask Register Layout 14.
Types |
n = 1 |
n = 2 |
n = 4 |
n = 8 |
n = 16 |
n = 32 |
n = 64 |
bool |
vbool1_t |
vbool2_t |
vbool4_t |
vbool8_t |
vbool16_t |
vbool32_t |
vbool64_t |
The tuple types encode SEW
, LMUL
, and NFIELD
16 into the data type.
These types are utilized for the segment load/store instruction intrinsics, the types listed in Integer types and Floating-point types all have tuple types. Types under the combination of LMUL
, NFIELD
follows the restriction by v-spec - EMUL * NFIELDS ≤ 8.
The full list of types is attached in the Appendix.
Some users may expect the intrinsics to directly translate and appear in the assembly, the intrinsics are the interfaces that expose the vector instruction semantics. The compiler is free to optimize them out if there is an opportunity.
The compiler should guarantee that the correct vtype is set given the EEW
and EMUL
specified in the intrinsic function name suffix, and the data type of the operand(s).
The V extension spec mentions 15 that the strided load/store instruction with stride of 0 could have different behaviors to perform all memory accesses or fewer memory operations. Since needing all memory accesses isn’t likely to be common, the compiler implementation is allowed to generate fewer memory operations with strided load/store intrinsics.
In other words, compiler does not guarantee generating the all memory accesses instruction in strided load/store intrinsics with stride of 0. If the user needs all memory accesses to be performed, they should use an indexed load/store intrinsics with all zero indices.
4 Section 3.4.1 (Vector selected element width vsew[2:0]
) in v-spec 0
5 Section 3.4.2 (Vector Register Grouping (vlmul[2:0]`
)) in v-spec 0
6 Section 3.4.3 (Vector Tail Agnostic and Vector Mask Agnostic vta
and vma
) in v-spec 0
7 Section 5.3 (Vector Masking) in v-spec 0
8 Section 3.8 (Vector Fixed-Point Rounding Mode Register vxrm
) in v-spec 0
11 Section 3.5 (Vector Length Register) in v-spec 0
12 Section 3.4.2 in v-spec 0
13 Section 11.13, 11.14, 13.6, 13.7 in v-spec 0
14 Section 4.5 (Mask Register Layout) in v-spec 0
15 Section 7.5 in v-spec 0
16 Section 7.8 in v-spec 0
17 Section 5.2 (Vector Operands) in v-spec 0