Skip to content

Latest commit

 

History

History
517 lines (356 loc) · 27.9 KB

rvv-intrinsic-spec.adoc

File metadata and controls

517 lines (356 loc) · 27.9 KB

Introduction

The RISC-V vector C intrinsics provides users interfaces in the C language level to directly leverage the RISC-V "V" extension 0 (also abbreviated as "RVV"), with help from the compiler to handle instruction scheduling and register allocation.

This document will use the term "v-spec" to indicate the RISC-V "V" extension specification 0.

Test macro

The __riscv_v_intrinsic macro is the test macro to check the availability and version the compiler supports for the RISC-V vector C intrinsic API.

The value of the architecture extension test macro is defined as its version, which is computed by the following formula. The formula is identical to what is defined under the RISC-V C API specification 1 .

For example, the v1.0 version should define the macro with value 1000000.

<MAJOR_VERSION> * 1,000,000 + <MINOR_VERSION> * 1,000 + <REVISION_VERSION>

Header file inclusion

To leverage the intrinsics in the compiler, the header <riscv_vector.h> needs to be included. We suggest the inclusion to be guarded with the test macro of the intrinsics.

#ifdef __riscv_v_intrinsic
#include <riscv_vector.h>
#endif /* __riscv_v_intrinsic */

Availability

With <riscv_vector.h> included, availability of each intrinsic depends on the required architecture of their corresponding vector instruction. The supported architecture is passed into the compiler using the -march option 2,3.

V-spec 0 section 18.2 describes the Zve* extensions and how the instructions are supported.

Control of the vector extension programming model

The intrinsics allow users to control fields in vtype, and also rounding modes for fixed-point (vxrm) and floating-point intrinsics (frm).

In this section, we cover how the intrinsics embed the control of vtype fields in the function names. Please see Naming scheme for the rules described in this section.

Control of eew, emul

The RISC-V vector intrinsics data types are strongly-typed, and hence so are the intrinsics taking these types as parameters. The vector intrinsics encode the EEW (effective-element-width) 17 and EMUL (effective LMUL) 17 in the suffix of the function name. Users can expect the results of the vector instruction intrinsics are computed under the specified EEW and EMUL.

To see the full list of data types for the intrinsics, please see Type system.

In the following example, the intrinsic will produce the result of vadd.vv with vector operands of EEW=32 and EMUL=1.

vint32m1_t __riscv_vadd_vv_i32m1 (vint32m1_t op1, vint32m1_t op2, size_t vl);

In the following example, the intrinsic will produce the result of vwadd.vv with destinations of EEW=32 and EMUL=1, and operands of EEW=16 and EMUL=mf2 (1/2).

vint32m1_t __riscv_vwadd_vv_i32m1 (vint16mf2_t op1, vint16mf2_t op2, size_t vl);

Control of vl

Direct control of vl (vector length register) 11 is not exposed at the C language level. In the intrinsics that represents vector instructions, the vl operand indicates the active vector length valid during intrinsic execution.

Note
Intrinsics for instructions that behave the same with different vl settings (e.g. vmv.s.x) do not have a vl operand.
Note
Intrinsics will at most operate vlmax (derived in 3.4.2 of v-spec) elements.

Control of vm

Instructions that are available for masking 7 are also supported in the intrinsics.

The intrinsics fuse the control of vector masking (vm) together with the control for the policy behavior in the same suffix. Please checkout Control of vta, vma and Policy naming scheme for the exact suffix that specifies a masked/unmasked vector operation along with its policy behavior.

Control of vta, vma

The behavior of destination tail elements and destination inactive masked-off elements is controled by the vta and vma bits 6.

Given the general assumption that target audience of the RVV intrinsics are high performance cores, and an "undisturbed" policy will generally slow down an out-of-order core, the intrinsics have a default policy scheme of tail-agnostic and mask-agnostic (that is, vta=1 and vma=1).

Please see Policy naming scheme for the exact suffix that specifies the policy behavior of a vector operation.

Control of vxrm

For the fixed-point intrinsics, representing the instructions under v-spec section 12, the vxrm operand of the intrinsics indicates the rounding mode (vxrm) 8 control.

The vxrm operand is required to be an integer constant expression. The compiler should implement the following enum that maps to the defined rounding mode values under v-spec Table 4 8.

enum __RISCV_VXRM {
  __RISCV_VXRM_RNU = 0,
  __RISCV_VXRM_RNE = 1,
  __RISCV_VXRM_RDN = 2,
  __RISCV_VXRM_ROD = 3,
};
Note
Computations of vsadd, vsaddu, vssub, and vssubu do not need the rounding mode, therefore the intrinsics of these instructions do not have the vxrm parameter.
Note
The RISC-V psABI 9 states that vxrm is not preserved across calls. Optimization for reducing the number of redundant writes to vxrm is a compiler and system specific issue.

Control of frm

For the floating-point intrinsics, representing the instructions under v-spec section 13, the intrinsics have two classes, called implicit-frm and explicit-frm.

Implicit-frm intrinsics

The implicit-frm intrinsics behave like any C-language floating-point expressions, using the default rounding mode when FENV_ACCESS is off, and using the fenv dynamic rounding mode when FENV_ACCESS is on.

Note
Both GNU and LLVM compilers generate scalar floating-point instructions using dynamic rounding mode, relying on the kernel initialization to set frm to RNE (specified as "roundTiesToEven" in IEEE-754 (a.k.a. IEC 60559)).
Note
The implicit-frm intrinsics are intended to be used regardless of FENV_ACCESS. They are provided when FENV_ACCESS is on for the (few) programmers who are already using fenv. And they are provided when FENV_ACCESS is off for the (vast majority of) programmers who prefer the default rounding mode.

Explicit-frm intrinsics

The explicit-frm intrinsics contain the frm operand which indicates the rounding mode (frm) 10 control. The floating-point intrinsics with the frm operand are followed by an _rm suffix in the function name.

The frm operand is required to be an integer constant expression. The compiler should implement the following enum that maps to the defined rounding mode values under RISC-V ISA Manual Table 8.1 8.

enum __RISCV_FRM {
  __RISCV_FRM_RNE = 0,
  __RISCV_FRM_RTZ = 1,
  __RISCV_FRM_RDN = 2,
  __RISCV_FRM_RUP = 3,
  __RISCV_FRM_RMM = 4,
};
Note
The explicit-frm intrinsics are intended to be used when FENV_ACCESS is off, to enable more aggressive optimization while still providing the programmer with control over the rounding mode. Using explicit-frm intrinsics when FENV_ACCESS` is on will still work correctly, but is expected to lead to extra saving/restoring of frm, that could be avoided by using fenv functionality and implicit-frm.

Naming scheme

The naming scheme of the intrinsics expresses the users' control of fields in vtype, and also rounding modes for the fixed-point and the floating-point intrinsics. For details of the vtype control, please see [control-to-vector-programming-mode].

The RVV intrinsics can be split into two major types, called "explicit (non-overloaded) intrinsics" and "implicit (overloaded) intrinsics".

The explicit (non-overloaded) intrinsics embeds the control described under Control of the vector extension programming model in the function name. This scheme makes intrinsics program easier to read given that the execution states be explicitly specified in the code.

The implicit (overloaded) intrinsics, on the contrary, hide the explicit specifications for vtype control. The implicit (overloaded) intrinsics aim to provide a generic interface to let users put values of different EEW and EMUL as the input operand.

This section covers the general naming rule of the two types of intrinsics accordingly. Then, this section also enumerates the exceptions and the rationale behind them in [chapter-exception-naming].

Policy naming scheme

With the default policy scheme mentioned under Control of vta, vma, each intrinsic provides corresponding variants for their available control of the policy behavior. The following list enumerates the control of the policy behavior and their corresponding suffix.

  • No suffix: Represents an unmasked (vm=1) vector operation with tail-agnostic (vta=1)

  • _tu suffix: Represents an unmasked (vm=1) vector operation with tail-undisturbed (vta=0)

  • _m suffix: Represents a masked (vm=0) vector operation with tail-agnostic (vta=1), mask-agnostic (vma=1)

  • _tum suffix: Represents a masked (vm=0) vector operation with tail-undisturbed (vta=0), mask-agnostic (vma=1)

  • _mu suffix: Represents a masked (vm=0) vector operation with tail-agnostic (vta=1), mask-undisturbed (vma=0)

  • _tumu suffix: Represents a masked (vm=0) vector operation with tail-undisturbed (vta=0), mask-undisturbed (vma=0)

Using vadd with EEW=32 and EMUL=1 as an example, the variants are:

// vm=1, vta=1
vint32m1_t __riscv_vadd_vv_i32m1(vint32m1_t op1, vint32m1_t op2, size_t vl);
// vm=1, vta=0
vint32m1_t __riscv_vadd_vv_i32m1_tu(vint32m1_t maskedoff, vint32m1_t op1,
                                    vint32m1_t op2, size_t vl);
// vm=0, vta=1, vma=1
vint32m1_t __riscv_vadd_vv_i32m1_m(vbool32_t mask, vint32m1_t op1,
                                   vint32m1_t op2, size_t vl);
// vm=0, vta=0, vma=1
vint32m1_t __riscv_vadd_vv_i32m1_tum(vbool32_t mask, vint32m1_t maskedoff,
                                     vint32m1_t op1, vint32m1_t op2, size_t vl);
// vm=0, vta=1, vma=0
vint32m1_t __riscv_vadd_vv_i32m1_mu(vbool32_t mask, vint32m1_t maskedoff,
                                    vint32m1_t op1, vint32m1_t op2, size_t vl);
// vm=0, vta=0, vma=0
vint32m1_t __riscv_vadd_vv_i32m1_tumu(vbool32_t mask, vint32m1_t maskedoff,
                                      vint32m1_t op1, vint32m1_t op2,
                                      size_t vl);
Note
When policy is set to "agnostic", there is no guarantee of what will be in the tail/masked-off elements. Under this policy users should not assume the values within to be deterministic. With no passthrough operand, the compiler can pick any register as the destination register.
Note
Pseudo intrinsics mentioned under Pseudo intrinsics do not map to real vector intsructions. Therefore these intrinsics are not affected by the policy setting, nor do they have intrinsic variants of the policy suffix listed above.

Passthrough parameters in the intrinsics

Intrinsics whose computation is relevant to value held in vd have a passthrough operand in them. The following list enumerates the intrinsics that has a passthrough operand. Please see (Appendix: list of prototypes of intrinsics) for the exact prototypes.

  • Intrinsics with tail-undisturbed (vta=0)

  • Intrinsics with mask-undisturbed (vma=0)

  • Intrinsics representing Vector Multiply-Add Operations 13

Explicit (Non-overloaded) naming scheme

In general, the intrinsics are encoded as the following. The intrinsics under this naming scheme are the "non-overloaded intrinsics", which in parallel we have the "overloaded intrinsics" defined under Implicit (Overloaded) naming scheme.

__riscv_{V_INSTRUCTION_MNEMONIC}_{OPERAND_MNEMONIC}_{RETURN_TYPE}_{ROUND_MODE}_{POLICY}{(...)
  • OPERAND_MNEMONIC are like vv, vx, vs, vvm, vxm

  • Depending on whether the RETURN_TYPE is a mask register…​

    • For intrinsics that have a non-mask register as the destination register

      • EEW is one of i8 | i16 | i32 | i64 | u8 | u16 | u32 | u64 | f16 | f32 | f64.

      • EMUL is one of m1 | m2 | m4 | m8 | mf2 | mf4 | mf8

    • For intrinsics that have a mask register as a destination register

      • The return type is one of b1 | b2 | b4 | b8 | b16 | b32 | b64, which is derived from the ratio EEW/EMUL

  • V_INSTRUCTION_MNEMONIC are like vadd, vfmacc, vsadd.

    • Type system explains the limited enumeration of EEW, LEUL pair.

  • ROUND_MODE is the _rm suffix mentioned in Explicit-frm intrinsics. Other intrinsics do not have this suffix

  • POLICY are enumerated under [chapter-control-to-policy]

The general naming scheme is not sufficient to express intrinsics. The exceptions are enumerated under Exceptions in the explicit (non-overloaded) naming scheme.

Exceptions in the explicit (non-overloaded) naming scheme

This section enumerates the exceptions in the naming scheme.

Scalar move instructions

Only encoding the return type will cause naming collision for the permutation instruction intrinsics. The intrinsics encode the input vector type and the the output scalar type in the suffix.

int8_t vmv_x_s_i8m1_i8 (vint8m1_t vs2, size_t vl);
int8_t vmv_x_s_i8m2_i8 (vint8m2_t vs2, size_t vl);
int8_t vmv_x_s_i8m4_i8 (vint8m4_t vs2, size_t vl);
int8_t vmv_x_s_i8m8_i8 (vint8m8_t vs2, size_t vl);

Reduction instructions

Only encoding the return type will cause naming collision for the reduction instruction intrinsics. The intrinsics encode the input vector type and the output vector type in the suffix.

vint8m1_t vredsum_vs_i8m1_i8m1(vint8m1_t dest, vint8m1_t vs2, vint8m1_t vs1,
                               size_t vl);
vint8m1_t vredsum_vs_i8m2_i8m1(vint8m1_t dest, vint8m2_t vs2, vint8m1_t vs1,
                               size_t vl);
vint8m1_t vredsum_vs_i8m4_i8m1(vint8m1_t dest, vint8m4_t vs2, vint8m1_t vs1,
                               size_t vl);
vint8m1_t vredsum_vs_i8m8_i8m1(vint8m1_t dest, vint8m8_t vs2, vint8m1_t vs1,
                               size_t vl);

vreinterpret, vlmul_trunc/vlmul_ext, and vset/vget

Only encoding the return type will cause naming collision for these pseudo instructions. The intrinsics encode the input vector type and the output vector type in the suffix.

The following shows an example with _riscv_vreinterpret_v_i32m1*

vfloat32m1_t __riscv_vreinterpret_v_i32m1_f32m1 (vint32m1_t src);
vuint32m1_t __riscv_vreinterpret_v_i32m1_u32m1 (vint32m1_t src);
vint8m1_t __riscv_vreinterpret_v_i32m1_i8m1 (vint32m1_t src);
vint16m1_t __riscv_vreinterpret_v_i32m1_i16m1 (vint32m1_t src);
vint64m1_t __riscv_vreinterpret_v_i32m1_i64m1 (vint32m1_t src);
vbool64_t __riscv_vreinterpret_v_i32m1_b64 (vint32m1_t src);
vbool32_t __riscv_vreinterpret_v_i32m1_b32 (vint32m1_t src);
vbool16_t __riscv_vreinterpret_v_i32m1_b16 (vint32m1_t src);
vbool8_t __riscv_vreinterpret_v_i32m1_b8 (vint32m1_t src);
vbool4_t __riscv_vreinterpret_v_i32m1_b4 (vint32m1_t src);

Implicit (Overloaded) naming scheme

The overloaded interface aims to let users put values of different EEW and EMUL as the input operand, therefore hiding the EEW and EMUL encoded in the function name. The _rm prefix for explicit-frm intrinsics (Control of frm) is also hidden. The intrinsics under this scheme are the "overloaded intrinsics", which in parallel we have "non-overloaded intrinsics" defined under Explicit (Non-overloaded) naming scheme.

Take the vector add (vadd) as an example, stripping off the operand mnemonics and encoded EEW, EMUL information, the intrinsics API provides the following overloaded interfaces.

vint32m1_t __riscv_vadd(vint32m1_t v0, vint32m1_t v1, size_t vl);
vint16m4_t __riscv_vadd(vint16m4_t v0, vint16m4_t v1, size_t vl);

Since the main intent is to let the users put different values of EEW and EMUL as input operand, the overloaded intrinsics do not hide the policy suffix. That is, suffix listed under [chapter-control-to-policy] is not hidden and is still encoded in the function name, except for the masked, tail-agnostic, mask-agnostic (vm=0, vta=1, vma=1) variant. Take the vector floating-point add (vfadd) as an example, the intrinsics API provides the following overloaded interfaces.

vfloat32m1_t __riscv_vfadd(vbool32_t mask, vfloat32m1_t op1, vfloat32m1_t op2,
                           unsigned int frm, size_t vl);
vfloat16m4_t __riscv_vfadd(vbool4_t mask, vfloat16m4_t op1, vfloat16m4_t op2,
                           unsigned int frm, size_t vl);
vfloat32m1_t __riscv_vfadd_tu(vfloat32m1_t maskedoff, vfloat32m1_t op1,
                              vfloat32m1_t op2, size_t vl);
vfloat32m1_t __riscv_vfadd_tum(vbool32_t mask, vfloat32m1_t maskedoff,
                               vfloat32m1_t op1, vfloat32m1_t op2, size_t vl);
vfloat32m1_t __riscv_vfadd_tumu(vbool32_t mask, vfloat32m1_t maskedoff,
                                vfloat32m1_t op1, vfloat32m1_t op2, size_t vl);
vfloat32m1_t __riscv_vfadd_mu(vbool32_t mask, vfloat32m1_t maskedoff,
                              vfloat32m1_t op1, vfloat32m1_t op2, size_t vl);
vfloat32m1_t __riscv_vfadd_tu(vfloat32m1_t maskedoff, vfloat32m1_t op1,
                              vfloat32m1_t op2, unsigned int frm, size_t vl);
vfloat32m1_t __riscv_vfadd_tum(vbool32_t mask, vfloat32m1_t maskedoff,
                               vfloat32m1_t op1, vfloat32m1_t op2,
                               unsigned int frm, size_t vl);
vfloat32m1_t __riscv_vfadd_tumu(vbool32_t mask, vfloat32m1_t maskedoff,
                                vfloat32m1_t op1, vfloat32m1_t op2,
                                unsigned int frm, size_t vl);
vfloat32m1_t __riscv_vfadd_mu(vbool32_t mask, vfloat32m1_t maskedoff,
                              vfloat32m1_t op1, vfloat32m1_t op2,
                              unsigned int frm, size_t vl);

The naming scheme does not cover all of intrinsics. Please see Exceptions in the implicit (overloaded) naming sheme for overloaded intrinsics with irregular naming patterns.

Due to the limitation of the C language (without the aid of features like C++ templates), some intrinsics do not have an overloaded version. Therefore these intrinsics do not possess a simplified, EEW/EMUL-hidden interface. Please see Un-supported intrinsics for implicit (overloaded) naming scheme for more detail.

Exceptions in the implicit (overloaded) naming sheme

The following intrinsics have an irregular naming pattern.

Widening instructions

Widening intruction intrinsics (e.g. vwadd) have the same return type but different parameters. The operand mnemonics are encoded into their overloaded versions to help distinguish them.

vint32m1_t __riscv_vwadd_vv (vint16mf2_t op1, vint16mf2_t op2, size_t vl);
vint32m1_t __riscv_vwadd_vx (vint16mf2_t op1, int16_t op2, size_t vl);
vint32m1_t __riscv_vwadd_wv (vint32m1_t op1, vint16mf2_t op2, size_t vl);
vint32m1_t __riscv_vwadd_wx (vint32m1_t op1, int16_t op2, size_t vl);

Type-convert instructions

Type-convert instruction intrinsics (e.g. vfcvt.f.x) encode the returning value mnemonics (e.g. vfcvt_f) into their overloaded variant to help distinguish them.

vfloat32m1_t __riscv_vfcvt_f_tu(vfloat32m1_t maskedoff, vint32m1_t src,
                                size_t vl);
vfloat32m1_t __riscv_vfcvt_f_tum(vbool32_t mask, vfloat32m1_t maskedoff,
                                 vint32m1_t src, size_t vl);
vfloat32m1_t __riscv_vfcvt_f_tumu(vbool32_t mask, vfloat32m1_t maskedoff,
                                  vint32m1_t src, size_t vl);
vfloat32m1_t __riscv_vfcvt_f_mu(vbool32_t mask, vfloat32m1_t maskedoff,
                                vint32m1_t src, size_t vl);
vfloat32m1_t __riscv_vfcvt_f_tu(vfloat32m1_t maskedoff, vint32m1_t src,
                                unsigned int frm, size_t vl);
vfloat32m1_t __riscv_vfcvt_f_tum(vbool32_t mask, vfloat32m1_t maskedoff,
                                 vint32m1_t src, unsigned int frm, size_t vl);
vfloat32m1_t __riscv_vfcvt_f_tumu(vbool32_t mask, vfloat32m1_t maskedoff,
                                  vint32m1_t src, unsigned int frm, size_t vl);
vfloat32m1_t __riscv_vfcvt_f_mu(vbool32_t mask, vfloat32m1_t maskedoff,
                                vint32m1_t src, unsigned int frm, size_t vl);

vreinterpret, LMUL truncate/extension, and vset/vget

These pseudo intrinsics (e.g. vreinterpret) encode the return type (e.g. __riscv_vreinterpret_b8) into their overloaded variant to help distinguish them.

vbool8_t __riscv_vreinterpret_b8 (vint8m1_t src);
vbool8_t __riscv_vreinterpret_b8 (vuint8m1_t src);
vbool8_t __riscv_vreinterpret_b8 (vint16m1_t src);
vbool8_t __riscv_vreinterpret_b8 (vuint16m1_t src);
vbool8_t __riscv_vreinterpret_b8 (vint32m1_t src);
vbool8_t __riscv_vreinterpret_b8 (vuint32m1_t src);
vbool8_t __riscv_vreinterpret_b8 (vint64m1_t src);
vbool8_t __riscv_vreinterpret_b8 (vuint64m1_t src);

Un-supported intrinsics for implicit (overloaded) naming scheme

Due to the limitation of the C language (without the aid of features like C++ templates), some intrinsics do not have an overloaded version. Intrinsics with characteristics of either of the following do not possess an overloaded version.

  • Intrinsics with input arguments are all scalar types and scalar types alone (e.g. Vector load instruction intrinsics, Vector move instruction intrinsics)

  • Intrinsics without any input argument (e.g. vmclr, vmset, vid)

  • Intrinsics with vector boolean input(s), returning a vector non-boolean vector type (e.g. viota)

Type system

The RVV intrinsics are designed to be strongly-typed. The intrinsics provide vreinterpret intrinsics to help users go across the strongly-typed scheme if necessary.

Non-mask (integer and floating-point) data types have SEW and LMUL encoded.

Integer types

The integer types have EEW and EMUL encoded into the type. The first row describes the EMUL and the first column describes the data type and element width of the scalar type.

Type with bold font is only available when ELEN >= 64 (that is, unavailable under Zve32*).

Table 1. Integer types

Types

EMUL=1/8

EMUL=1/4

EMUL=1/ 2

EMUL=1

EMUL=2

EMUL=4

EMUL=8

int8_t

vint8mf8_t

vint8mf4_t

vint8mf2_t

vint8m1_t

vint8m2_t

vint8m4_t

vint8m8_t

int16_t

N/A

vint16mf4_t

vint16mf2_t

vint16m1_t

vint16m2_t

vint16m4_t

vint16m16_t

int32_t

N/A

N/A

vint32mf2_t

vint32m1_t

vint32m2_t

vint32m4_t

vint32m32_t

int64_t

N/A

N/A

N/A

vint64m1_t

vint64m2_t

vint64m4_t

vint64m8_t

uint8_t

vuint8mf8_t

vuint8mf4_t

vuint8mf2_t

vuint8m1_t

vuint8m2_t

vuint8m4_t

vuint8m8_t

uint16_t

N/A

vuint16mf4_t

vuint16mf2_t

vuint16m1_t

vuint16m2_t

vuint16m4_t

vuint16m8_t

uint32_t

N/A

N/A

vuint32mf2_t

vuint32m1_t

vuint32m2_t

vuint32m4_t

vuint32m8_t

uint64_t

N/A

N/A

N/A

vuint64m1_t

vuint64m2_t

vuint64m4_t

vuint64m8_t

Floating-point types

The floating-point types have EEW and EMUL encoded into the type. The first row describes the EMUL and the first column describes the data type and element width of the scalar type.

Floating-point types with element widths of 16 requires the zvfh and zvfhmin extension to be specified in the architecture.

Floating-point types with element widths of 32 requires the zve32f extension to be specified in the architecture.

Floating-point types with element widths of 64 requires the zve64d extension to be specified in the architecture.

Table 2. Floating-point types

Types

EMUL=1/8

EMUL=1/4

EMUL=1/ 2

EMUL=1

EMUL=2

EMUL=4

EMUL=8

float16_t

N/A

vfloat16m4_t

vfloat16mf2_t

vfloat16m1_t

vfloat16m2_t

vfloat16m4_t

vfloat16m8_t

float32_t

N/A

N/A

vfloat32mf2_t

vfloat32m1_t

vfloat32m2_t

vfloat32m4_t

vfloat32m8_t

float64_t

N/A

N/A

N/A

vfloat64m1_t

vfloat64m2_t

vfloat64m4_t

vfloat64m8_t

Mask types

The mask types encode the ratio that is derived from EEW/EMUL. The mask types represent mask register values that follows the Mask Register Layout 14.

Table 3. Mask types

Types

n = 1

n = 2

n = 4

n = 8

n = 16

n = 32

n = 64

bool

vbool1_t

vbool2_t

vbool4_t

vbool8_t

vbool16_t

vbool32_t

vbool64_t

Tuple type

The tuple types encode SEW, LMUL, and NFIELD16 into the data type.

These types are utilized for the segment load/store instruction intrinsics, the types listed in Integer types and Floating-point types all have tuple types. Types under the combination of LMUL, NFIELD follows the restriction by v-spec - EMUL * NFIELDS ≤ 8.

The full list of types is attached in the Appendix.

Pseudo intrinsics

The intrinsics provide extra utility functions to help users manipulate across the RVV intrinsic types. These functions are called "pseudo intrinsics". These pseudo intrinsics do not represent any real instructions.

vsetvl/vsetvlmax

vreinterpret

vundefined

vget/vset

vlmul_trunc/vlmul_ext

Programming Notes

Assembly generated from the intrinsics

Some users may expect the intrinsics to directly translate and appear in the assembly, the intrinsics are the interfaces that expose the vector instruction semantics. The compiler is free to optimize them out if there is an opportunity.

Bookkeeping of vtype in the compiler

The compiler should guarantee that the correct vtype is set given the EEW and EMUL specified in the intrinsic function name suffix, and the data type of the operand(s).

Strided load/store with stride of 0

The V extension spec mentions 15 that the strided load/store instruction with stride of 0 could have different behaviors to perform all memory accesses or fewer memory operations. Since needing all memory accesses isn’t likely to be common, the compiler implementation is allowed to generate fewer memory operations with strided load/store intrinsics.

In other words, compiler does not guarantee generating the all memory accesses instruction in strided load/store intrinsics with stride of 0. If the user needs all memory accesses to be performed, they should use an indexed load/store intrinsics with all zero indices.

References

4 Section 3.4.1 (Vector selected element width vsew[2:0]) in v-spec 0

5 Section 3.4.2 (Vector Register Grouping (vlmul[2:0]`)) in v-spec 0

6 Section 3.4.3 (Vector Tail Agnostic and Vector Mask Agnostic vta and vma) in v-spec 0

7 Section 5.3 (Vector Masking) in v-spec 0

8 Section 3.8 (Vector Fixed-Point Rounding Mode Register vxrm) in v-spec 0

11 Section 3.5 (Vector Length Register) in v-spec 0

12 Section 3.4.2 in v-spec 0

13 Section 11.13, 11.14, 13.6, 13.7 in v-spec 0

14 Section 4.5 (Mask Register Layout) in v-spec 0

15 Section 7.5 in v-spec 0

16 Section 7.8 in v-spec 0

17 Section 5.2 (Vector Operands) in v-spec 0