[RISCV] Expand vp.fma, fp.fmuladd, vp.fneg, vp.fpext#190589
[RISCV] Expand vp.fma, fp.fmuladd, vp.fneg, vp.fpext#190589
Conversation
Part of the work to remove trivial VP intrinsics from the RISC-V backend, see https://discourse.llvm.org/t/rfc-remove-codegen-support-for-trivial-vp-intrinsics-in-the-risc-v-backend/87999 This PR expands four intrinsics before codegen, but doesn't remove the codegen handling yet as both DAGCombiner and type legalization can create these nodes. vp.fneg and vp.fpext are expanded in lockstep with the fma/fmuladd intrinsics since some test cases for vfmacc etc. also use these intrinsics, and mixing dynamic and constant vls causes some of the more complex patterns to be missed. The fixed-length VP vfmacc, vfmsac, vfnmacc and vfnmsac tests also need to replace the EVL of the vp.merge/vp.select with an immediate otherwise the resulting vmerge.vvm can't be folded into them. This only happens for fixed vector intrinsics with no passthru, since we end up with a constant vl from the fixed vector and dynamic vl from the vp.merge that prevents folding. As far as I'm aware, we don't emit fixed length vp.merges, since we only emit vp.merge in the loop vectorizer, and we only use it with EVL tail folding which requires a scalable VF.
|
@llvm/pr-subscribers-backend-risc-v Author: Luke Lau (lukel97) ChangesPart of the work to remove trivial VP intrinsics from the RISC-V backend, see https://discourse.llvm.org/t/rfc-remove-codegen-support-for-trivial-vp-intrinsics-in-the-risc-v-backend/87999 This PR expands four intrinsics before codegen, but doesn't remove the codegen handling yet as both DAGCombiner and type legalization can create these nodes. vp.fneg and vp.fpext are expanded in lockstep with the fma/fmuladd intrinsics since some test cases for vfmacc etc. also use these intrinsics, and mixing dynamic and constant vls causes some of the more complex patterns to be missed. The fixed-length VP vfmacc, vfmsac, vfnmacc and vfnmsac tests also need to replace the EVL of the vp.merge/vp.select with an immediate otherwise the resulting vmerge.vvm can't be folded into them. This only happens for fixed vector intrinsics with no passthru, since we end up with a constant vl from the fixed vector and dynamic vl from the vp.merge that prevents folding. As far as I'm aware, we don't emit fixed length vp.merges, since we only emit vp.merge in the loop vectorizer, and we only use it with EVL tail folding which requires a scalable VF. Patch is 1.87 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/190589.diff 22 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
index 38c859a08c41b..4612a1c917f2e 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
@@ -381,11 +381,7 @@ class RISCVTTIImpl final : public BasicTTIImplBase<RISCVTTIImpl> {
Intrinsic::vp_fadd,
Intrinsic::vp_fcmp,
Intrinsic::vp_fdiv,
- Intrinsic::vp_fma,
Intrinsic::vp_fmul,
- Intrinsic::vp_fmuladd,
- Intrinsic::vp_fneg,
- Intrinsic::vp_fpext,
Intrinsic::vp_fptosi,
Intrinsic::vp_fptoui,
Intrinsic::vp_fptrunc,
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fpext-vp.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fpext-vp.ll
index 465b166826a37..ee8c8905fb2fa 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fpext-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fpext-vp.ll
@@ -7,8 +7,8 @@
define <2 x float> @vfpext_v2f16_v2f32(<2 x half> %a, <2 x i1> %m, i32 zeroext %vl) {
; CHECK-LABEL: vfpext_v2f16_v2f32:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
-; CHECK-NEXT: vfwcvt.f.f.v v9, v8, v0.t
+; CHECK-NEXT: vsetivli zero, 2, e16, mf4, ta, ma
+; CHECK-NEXT: vfwcvt.f.f.v v9, v8
; CHECK-NEXT: vmv1r.v v8, v9
; CHECK-NEXT: ret
%v = call <2 x float> @llvm.vp.fpext.v2f32.v2f16(<2 x half> %a, <2 x i1> %m, i32 %vl)
@@ -18,7 +18,7 @@ define <2 x float> @vfpext_v2f16_v2f32(<2 x half> %a, <2 x i1> %m, i32 zeroext %
define <2 x float> @vfpext_v2f16_v2f32_unmasked(<2 x half> %a, i32 zeroext %vl) {
; CHECK-LABEL: vfpext_v2f16_v2f32_unmasked:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
+; CHECK-NEXT: vsetivli zero, 2, e16, mf4, ta, ma
; CHECK-NEXT: vfwcvt.f.f.v v9, v8
; CHECK-NEXT: vmv1r.v v8, v9
; CHECK-NEXT: ret
@@ -29,10 +29,10 @@ define <2 x float> @vfpext_v2f16_v2f32_unmasked(<2 x half> %a, i32 zeroext %vl)
define <2 x double> @vfpext_v2f16_v2f64(<2 x half> %a, <2 x i1> %m, i32 zeroext %vl) {
; CHECK-LABEL: vfpext_v2f16_v2f64:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
-; CHECK-NEXT: vfwcvt.f.f.v v9, v8, v0.t
+; CHECK-NEXT: vsetivli zero, 2, e16, mf4, ta, ma
+; CHECK-NEXT: vfwcvt.f.f.v v9, v8
; CHECK-NEXT: vsetvli zero, zero, e32, mf2, ta, ma
-; CHECK-NEXT: vfwcvt.f.f.v v8, v9, v0.t
+; CHECK-NEXT: vfwcvt.f.f.v v8, v9
; CHECK-NEXT: ret
%v = call <2 x double> @llvm.vp.fpext.v2f64.v2f16(<2 x half> %a, <2 x i1> %m, i32 %vl)
ret <2 x double> %v
@@ -41,7 +41,7 @@ define <2 x double> @vfpext_v2f16_v2f64(<2 x half> %a, <2 x i1> %m, i32 zeroext
define <2 x double> @vfpext_v2f16_v2f64_unmasked(<2 x half> %a, i32 zeroext %vl) {
; CHECK-LABEL: vfpext_v2f16_v2f64_unmasked:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
+; CHECK-NEXT: vsetivli zero, 2, e16, mf4, ta, ma
; CHECK-NEXT: vfwcvt.f.f.v v9, v8
; CHECK-NEXT: vsetvli zero, zero, e32, mf2, ta, ma
; CHECK-NEXT: vfwcvt.f.f.v v8, v9
@@ -53,8 +53,8 @@ define <2 x double> @vfpext_v2f16_v2f64_unmasked(<2 x half> %a, i32 zeroext %vl)
define <2 x double> @vfpext_v2f32_v2f64(<2 x float> %a, <2 x i1> %m, i32 zeroext %vl) {
; CHECK-LABEL: vfpext_v2f32_v2f64:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a0, e32, mf2, ta, ma
-; CHECK-NEXT: vfwcvt.f.f.v v9, v8, v0.t
+; CHECK-NEXT: vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-NEXT: vfwcvt.f.f.v v9, v8
; CHECK-NEXT: vmv1r.v v8, v9
; CHECK-NEXT: ret
%v = call <2 x double> @llvm.vp.fpext.v2f64.v2f32(<2 x float> %a, <2 x i1> %m, i32 %vl)
@@ -64,7 +64,7 @@ define <2 x double> @vfpext_v2f32_v2f64(<2 x float> %a, <2 x i1> %m, i32 zeroext
define <2 x double> @vfpext_v2f32_v2f64_unmasked(<2 x float> %a, i32 zeroext %vl) {
; CHECK-LABEL: vfpext_v2f32_v2f64_unmasked:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a0, e32, mf2, ta, ma
+; CHECK-NEXT: vsetivli zero, 2, e32, mf2, ta, ma
; CHECK-NEXT: vfwcvt.f.f.v v9, v8
; CHECK-NEXT: vmv1r.v v8, v9
; CHECK-NEXT: ret
@@ -75,9 +75,9 @@ define <2 x double> @vfpext_v2f32_v2f64_unmasked(<2 x float> %a, i32 zeroext %vl
define <15 x double> @vfpext_v15f32_v15f64(<15 x float> %a, <15 x i1> %m, i32 zeroext %vl) {
; CHECK-LABEL: vfpext_v15f32_v15f64:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a0, e32, m4, ta, ma
+; CHECK-NEXT: vsetivli zero, 16, e32, m4, ta, ma
; CHECK-NEXT: vmv4r.v v16, v8
-; CHECK-NEXT: vfwcvt.f.f.v v8, v16, v0.t
+; CHECK-NEXT: vfwcvt.f.f.v v8, v16
; CHECK-NEXT: ret
%v = call <15 x double> @llvm.vp.fpext.v15f64.v15f32(<15 x float> %a, <15 x i1> %m, i32 %vl)
ret <15 x double> %v
@@ -86,27 +86,13 @@ define <15 x double> @vfpext_v15f32_v15f64(<15 x float> %a, <15 x i1> %m, i32 ze
define <32 x double> @vfpext_v32f32_v32f64(<32 x float> %a, <32 x i1> %m, i32 zeroext %vl) {
; CHECK-LABEL: vfpext_v32f32_v32f64:
; CHECK: # %bb.0:
-; CHECK-NEXT: li a2, 16
-; CHECK-NEXT: vsetivli zero, 2, e8, mf4, ta, ma
-; CHECK-NEXT: vslidedown.vi v24, v0, 2
-; CHECK-NEXT: mv a1, a0
-; CHECK-NEXT: bltu a0, a2, .LBB7_2
-; CHECK-NEXT: # %bb.1:
-; CHECK-NEXT: li a1, 16
-; CHECK-NEXT: .LBB7_2:
-; CHECK-NEXT: vsetvli zero, a1, e32, m4, ta, ma
-; CHECK-NEXT: vfwcvt.f.f.v v16, v8, v0.t
-; CHECK-NEXT: addi a1, a0, -16
-; CHECK-NEXT: sltu a0, a0, a1
-; CHECK-NEXT: addi a0, a0, -1
-; CHECK-NEXT: and a0, a0, a1
+; CHECK-NEXT: vsetivli zero, 16, e32, m4, ta, ma
+; CHECK-NEXT: vfwcvt.f.f.v v24, v8
; CHECK-NEXT: vsetivli zero, 16, e32, m8, ta, ma
; CHECK-NEXT: vslidedown.vi v8, v8, 16
-; CHECK-NEXT: vmv1r.v v0, v24
-; CHECK-NEXT: vsetvli zero, a0, e32, m4, ta, ma
-; CHECK-NEXT: vfwcvt.f.f.v v24, v8, v0.t
-; CHECK-NEXT: vmv8r.v v8, v16
-; CHECK-NEXT: vmv8r.v v16, v24
+; CHECK-NEXT: vsetivli zero, 16, e32, m4, ta, ma
+; CHECK-NEXT: vfwcvt.f.f.v v16, v8
+; CHECK-NEXT: vmv8r.v v8, v24
; CHECK-NEXT: ret
%v = call <32 x double> @llvm.vp.fpext.v32f64.v32f32(<32 x float> %a, <32 x i1> %m, i32 %vl)
ret <32 x double> %v
@@ -115,8 +101,8 @@ define <32 x double> @vfpext_v32f32_v32f64(<32 x float> %a, <32 x i1> %m, i32 ze
define <2 x float> @vfpext_v2bf16_v2f32(<2 x bfloat> %a, <2 x i1> %m, i32 zeroext %vl) {
; CHECK-LABEL: vfpext_v2bf16_v2f32:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
-; CHECK-NEXT: vfwcvtbf16.f.f.v v9, v8, v0.t
+; CHECK-NEXT: vsetivli zero, 2, e16, mf4, ta, ma
+; CHECK-NEXT: vfwcvtbf16.f.f.v v9, v8
; CHECK-NEXT: vmv1r.v v8, v9
; CHECK-NEXT: ret
%v = call <2 x float> @llvm.vp.fpext.v2f32.v2bf16(<2 x bfloat> %a, <2 x i1> %m, i32 %vl)
@@ -126,7 +112,7 @@ define <2 x float> @vfpext_v2bf16_v2f32(<2 x bfloat> %a, <2 x i1> %m, i32 zeroex
define <2 x float> @vfpext_v2bf16_v2f32_unmasked(<2 x bfloat> %a, i32 zeroext %vl) {
; CHECK-LABEL: vfpext_v2bf16_v2f32_unmasked:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
+; CHECK-NEXT: vsetivli zero, 2, e16, mf4, ta, ma
; CHECK-NEXT: vfwcvtbf16.f.f.v v9, v8
; CHECK-NEXT: vmv1r.v v8, v9
; CHECK-NEXT: ret
@@ -137,10 +123,10 @@ define <2 x float> @vfpext_v2bf16_v2f32_unmasked(<2 x bfloat> %a, i32 zeroext %v
define <2 x double> @vfpext_v2bf16_v2f64(<2 x bfloat> %a, <2 x i1> %m, i32 zeroext %vl) {
; CHECK-LABEL: vfpext_v2bf16_v2f64:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
-; CHECK-NEXT: vfwcvtbf16.f.f.v v9, v8, v0.t
+; CHECK-NEXT: vsetivli zero, 2, e16, mf4, ta, ma
+; CHECK-NEXT: vfwcvtbf16.f.f.v v9, v8
; CHECK-NEXT: vsetvli zero, zero, e32, mf2, ta, ma
-; CHECK-NEXT: vfwcvt.f.f.v v8, v9, v0.t
+; CHECK-NEXT: vfwcvt.f.f.v v8, v9
; CHECK-NEXT: ret
%v = call <2 x double> @llvm.vp.fpext.v2f64.v2bf16(<2 x bfloat> %a, <2 x i1> %m, i32 %vl)
ret <2 x double> %v
@@ -149,7 +135,7 @@ define <2 x double> @vfpext_v2bf16_v2f64(<2 x bfloat> %a, <2 x i1> %m, i32 zeroe
define <2 x double> @vfpext_v2bf16_v2f64_unmasked(<2 x bfloat> %a, i32 zeroext %vl) {
; CHECK-LABEL: vfpext_v2bf16_v2f64_unmasked:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
+; CHECK-NEXT: vsetivli zero, 2, e16, mf4, ta, ma
; CHECK-NEXT: vfwcvtbf16.f.f.v v9, v8
; CHECK-NEXT: vsetvli zero, zero, e32, mf2, ta, ma
; CHECK-NEXT: vfwcvt.f.f.v v8, v9
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfma-vp.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfma-vp.ll
index f28b970f48ff7..53eef59f71d6d 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfma-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfma-vp.ll
@@ -11,21 +11,20 @@
define <2 x half> @vfma_vv_v2f16(<2 x half> %va, <2 x half> %b, <2 x half> %c, <2 x i1> %m, i32 zeroext %evl) {
; ZVFH-LABEL: vfma_vv_v2f16:
; ZVFH: # %bb.0:
-; ZVFH-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
-; ZVFH-NEXT: vfmadd.vv v9, v8, v10, v0.t
-; ZVFH-NEXT: vmv1r.v v8, v9
+; ZVFH-NEXT: vsetivli zero, 2, e16, mf4, ta, ma
+; ZVFH-NEXT: vfmadd.vv v8, v9, v10
; ZVFH-NEXT: ret
;
; ZVFHMIN-LABEL: vfma_vv_v2f16:
; ZVFHMIN: # %bb.0:
-; ZVFHMIN-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
-; ZVFHMIN-NEXT: vfwcvt.f.f.v v11, v10, v0.t
-; ZVFHMIN-NEXT: vfwcvt.f.f.v v10, v8, v0.t
-; ZVFHMIN-NEXT: vfwcvt.f.f.v v12, v9, v0.t
+; ZVFHMIN-NEXT: vsetivli zero, 2, e16, mf4, ta, ma
+; ZVFHMIN-NEXT: vfwcvt.f.f.v v11, v10
+; ZVFHMIN-NEXT: vfwcvt.f.f.v v10, v8
+; ZVFHMIN-NEXT: vfwcvt.f.f.v v12, v9
; ZVFHMIN-NEXT: vsetvli zero, zero, e32, mf2, ta, ma
-; ZVFHMIN-NEXT: vfmadd.vv v12, v10, v11, v0.t
+; ZVFHMIN-NEXT: vfmadd.vv v12, v10, v11
; ZVFHMIN-NEXT: vsetvli zero, zero, e16, mf4, ta, ma
-; ZVFHMIN-NEXT: vfncvt.f.f.w v8, v12, v0.t
+; ZVFHMIN-NEXT: vfncvt.f.f.w v8, v12
; ZVFHMIN-NEXT: ret
%v = call <2 x half> @llvm.vp.fma.v2f16(<2 x half> %va, <2 x half> %b, <2 x half> %c, <2 x i1> %m, i32 %evl)
ret <2 x half> %v
@@ -34,13 +33,13 @@ define <2 x half> @vfma_vv_v2f16(<2 x half> %va, <2 x half> %b, <2 x half> %c, <
define <2 x half> @vfma_vv_v2f16_unmasked(<2 x half> %va, <2 x half> %b, <2 x half> %c, i32 zeroext %evl) {
; ZVFH-LABEL: vfma_vv_v2f16_unmasked:
; ZVFH: # %bb.0:
-; ZVFH-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
+; ZVFH-NEXT: vsetivli zero, 2, e16, mf4, ta, ma
; ZVFH-NEXT: vfmadd.vv v8, v9, v10
; ZVFH-NEXT: ret
;
; ZVFHMIN-LABEL: vfma_vv_v2f16_unmasked:
; ZVFHMIN: # %bb.0:
-; ZVFHMIN-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
+; ZVFHMIN-NEXT: vsetivli zero, 2, e16, mf4, ta, ma
; ZVFHMIN-NEXT: vfwcvt.f.f.v v11, v10
; ZVFHMIN-NEXT: vfwcvt.f.f.v v10, v8
; ZVFHMIN-NEXT: vfwcvt.f.f.v v12, v9
@@ -56,24 +55,22 @@ define <2 x half> @vfma_vv_v2f16_unmasked(<2 x half> %va, <2 x half> %b, <2 x ha
define <2 x half> @vfma_vf_v2f16(<2 x half> %va, half %b, <2 x half> %vc, <2 x i1> %m, i32 zeroext %evl) {
; ZVFH-LABEL: vfma_vf_v2f16:
; ZVFH: # %bb.0:
-; ZVFH-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
-; ZVFH-NEXT: vfmadd.vf v8, fa0, v9, v0.t
+; ZVFH-NEXT: vsetivli zero, 2, e16, mf4, ta, ma
+; ZVFH-NEXT: vfmadd.vf v8, fa0, v9
; ZVFH-NEXT: ret
;
; ZVFHMIN-LABEL: vfma_vf_v2f16:
; ZVFHMIN: # %bb.0:
-; ZVFHMIN-NEXT: fmv.x.h a1, fa0
-; ZVFHMIN-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
-; ZVFHMIN-NEXT: vfwcvt.f.f.v v10, v9, v0.t
+; ZVFHMIN-NEXT: fmv.x.h a0, fa0
; ZVFHMIN-NEXT: vsetivli zero, 2, e16, mf4, ta, ma
-; ZVFHMIN-NEXT: vmv.v.x v9, a1
-; ZVFHMIN-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
-; ZVFHMIN-NEXT: vfwcvt.f.f.v v11, v8, v0.t
-; ZVFHMIN-NEXT: vfwcvt.f.f.v v12, v9, v0.t
+; ZVFHMIN-NEXT: vfwcvt.f.f.v v10, v9
+; ZVFHMIN-NEXT: vmv.v.x v9, a0
+; ZVFHMIN-NEXT: vfwcvt.f.f.v v11, v8
+; ZVFHMIN-NEXT: vfwcvt.f.f.v v12, v9
; ZVFHMIN-NEXT: vsetvli zero, zero, e32, mf2, ta, ma
-; ZVFHMIN-NEXT: vfmadd.vv v12, v11, v10, v0.t
+; ZVFHMIN-NEXT: vfmadd.vv v12, v11, v10
; ZVFHMIN-NEXT: vsetvli zero, zero, e16, mf4, ta, ma
-; ZVFHMIN-NEXT: vfncvt.f.f.w v8, v12, v0.t
+; ZVFHMIN-NEXT: vfncvt.f.f.w v8, v12
; ZVFHMIN-NEXT: ret
%elt.head = insertelement <2 x half> poison, half %b, i32 0
%vb = shufflevector <2 x half> %elt.head, <2 x half> poison, <2 x i32> zeroinitializer
@@ -84,18 +81,16 @@ define <2 x half> @vfma_vf_v2f16(<2 x half> %va, half %b, <2 x half> %vc, <2 x i
define <2 x half> @vfma_vf_v2f16_unmasked(<2 x half> %va, half %b, <2 x half> %vc, i32 zeroext %evl) {
; ZVFH-LABEL: vfma_vf_v2f16_unmasked:
; ZVFH: # %bb.0:
-; ZVFH-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
+; ZVFH-NEXT: vsetivli zero, 2, e16, mf4, ta, ma
; ZVFH-NEXT: vfmadd.vf v8, fa0, v9
; ZVFH-NEXT: ret
;
; ZVFHMIN-LABEL: vfma_vf_v2f16_unmasked:
; ZVFHMIN: # %bb.0:
-; ZVFHMIN-NEXT: fmv.x.h a1, fa0
-; ZVFHMIN-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
-; ZVFHMIN-NEXT: vfwcvt.f.f.v v10, v9
+; ZVFHMIN-NEXT: fmv.x.h a0, fa0
; ZVFHMIN-NEXT: vsetivli zero, 2, e16, mf4, ta, ma
-; ZVFHMIN-NEXT: vmv.v.x v9, a1
-; ZVFHMIN-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
+; ZVFHMIN-NEXT: vfwcvt.f.f.v v10, v9
+; ZVFHMIN-NEXT: vmv.v.x v9, a0
; ZVFHMIN-NEXT: vfwcvt.f.f.v v11, v8
; ZVFHMIN-NEXT: vfwcvt.f.f.v v12, v9
; ZVFHMIN-NEXT: vsetvli zero, zero, e32, mf2, ta, ma
@@ -112,21 +107,20 @@ define <2 x half> @vfma_vf_v2f16_unmasked(<2 x half> %va, half %b, <2 x half> %v
define <4 x half> @vfma_vv_v4f16(<4 x half> %va, <4 x half> %b, <4 x half> %c, <4 x i1> %m, i32 zeroext %evl) {
; ZVFH-LABEL: vfma_vv_v4f16:
; ZVFH: # %bb.0:
-; ZVFH-NEXT: vsetvli zero, a0, e16, mf2, ta, ma
-; ZVFH-NEXT: vfmadd.vv v9, v8, v10, v0.t
-; ZVFH-NEXT: vmv1r.v v8, v9
+; ZVFH-NEXT: vsetivli zero, 4, e16, mf2, ta, ma
+; ZVFH-NEXT: vfmadd.vv v8, v9, v10
; ZVFH-NEXT: ret
;
; ZVFHMIN-LABEL: vfma_vv_v4f16:
; ZVFHMIN: # %bb.0:
-; ZVFHMIN-NEXT: vsetvli zero, a0, e16, mf2, ta, ma
-; ZVFHMIN-NEXT: vfwcvt.f.f.v v11, v10, v0.t
-; ZVFHMIN-NEXT: vfwcvt.f.f.v v10, v8, v0.t
-; ZVFHMIN-NEXT: vfwcvt.f.f.v v12, v9, v0.t
+; ZVFHMIN-NEXT: vsetivli zero, 4, e16, mf2, ta, ma
+; ZVFHMIN-NEXT: vfwcvt.f.f.v v11, v10
+; ZVFHMIN-NEXT: vfwcvt.f.f.v v10, v8
+; ZVFHMIN-NEXT: vfwcvt.f.f.v v12, v9
; ZVFHMIN-NEXT: vsetvli zero, zero, e32, m1, ta, ma
-; ZVFHMIN-NEXT: vfmadd.vv v12, v10, v11, v0.t
+; ZVFHMIN-NEXT: vfmadd.vv v12, v10, v11
; ZVFHMIN-NEXT: vsetvli zero, zero, e16, mf2, ta, ma
-; ZVFHMIN-NEXT: vfncvt.f.f.w v8, v12, v0.t
+; ZVFHMIN-NEXT: vfncvt.f.f.w v8, v12
; ZVFHMIN-NEXT: ret
%v = call <4 x half> @llvm.vp.fma.v4f16(<4 x half> %va, <4 x half> %b, <4 x half> %c, <4 x i1> %m, i32 %evl)
ret <4 x half> %v
@@ -135,13 +129,13 @@ define <4 x half> @vfma_vv_v4f16(<4 x half> %va, <4 x half> %b, <4 x half> %c, <
define <4 x half> @vfma_vv_v4f16_unmasked(<4 x half> %va, <4 x half> %b, <4 x half> %c, i32 zeroext %evl) {
; ZVFH-LABEL: vfma_vv_v4f16_unmasked:
; ZVFH: # %bb.0:
-; ZVFH-NEXT: vsetvli zero, a0, e16, mf2, ta, ma
+; ZVFH-NEXT: vsetivli zero, 4, e16, mf2, ta, ma
; ZVFH-NEXT: vfmadd.vv v8, v9, v10
; ZVFH-NEXT: ret
;
; ZVFHMIN-LABEL: vfma_vv_v4f16_unmasked:
; ZVFHMIN: # %bb.0:
-; ZVFHMIN-NEXT: vsetvli zero, a0, e16, mf2, ta, ma
+; ZVFHMIN-NEXT: vsetivli zero, 4, e16, mf2, ta, ma
; ZVFHMIN-NEXT: vfwcvt.f.f.v v11, v10
; ZVFHMIN-NEXT: vfwcvt.f.f.v v10, v8
; ZVFHMIN-NEXT: vfwcvt.f.f.v v12, v9
@@ -157,24 +151,22 @@ define <4 x half> @vfma_vv_v4f16_unmasked(<4 x half> %va, <4 x half> %b, <4 x ha
define <4 x half> @vfma_vf_v4f16(<4 x half> %va, half %b, <4 x half> %vc, <4 x i1> %m, i32 zeroext %evl) {
; ZVFH-LABEL: vfma_vf_v4f16:
; ZVFH: # %bb.0:
-; ZVFH-NEXT: vsetvli zero, a0, e16, mf2, ta, ma
-; ZVFH-NEXT: vfmadd.vf v8, fa0, v9, v0.t
+; ZVFH-NEXT: vsetivli zero, 4, e16, mf2, ta, ma
+; ZVFH-NEXT: vfmadd.vf v8, fa0, v9
; ZVFH-NEXT: ret
;
; ZVFHMIN-LABEL: vfma_vf_v4f16:
; ZVFHMIN: # %bb.0:
-; ZVFHMIN-NEXT: fmv.x.h a1, fa0
-; ZVFHMIN-NEXT: vsetvli zero, a0, e16, mf2, ta, ma
-; ZVFHMIN-NEXT: vfwcvt.f.f.v v10, v9, v0.t
+; ZVFHMIN-NEXT: fmv.x.h a0, fa0
; ZVFHMIN-NEXT: vsetivli zero, 4, e16, mf2, ta, ma
-; ZVFHMIN-NEXT: vmv.v.x v9, a1
-; ZVFHMIN-NEXT: vsetvli zero, a0, e16, mf2, ta, ma
-; ZVFHMIN-NEXT: vfwcvt.f.f.v v11, v8, v0.t
-; ZVFHMIN-NEXT: vfwcvt.f.f.v v12, v9, v0.t
+; ZVFHMIN-NEXT: vfwcvt.f.f.v v10, v9
+; ZVFHMIN-NEXT: vmv.v.x v9, a0
+; ZVFHMIN-NEXT: vfwcvt.f.f.v v11, v8
+; ZVFHMIN-NEXT: vfwcvt.f.f.v v12, v9
; ZVFHMIN-NEXT: vsetvli zero, zero, e32, m1, ta, ma
-; ZVFHMIN-NEXT: vfmadd.vv v12, v11, v10, v0.t
+; ZVFHMIN-NEXT: vfmadd.vv v12, v11, v10
; ZVFHMIN-NEXT: vsetvli zero, zero, e16, mf2, ta, ma
-; ZVFHMIN-NEXT: vfncvt.f.f.w v8, v12, v0.t
+; ZVFHMIN-NEXT: vfncvt.f.f.w v8, v12
; ZVFHMIN-NEXT: ret
%elt.head = insertelement <4 x half> poison, half %b, i32 0
%vb = shufflevector <4 x half> %elt.head, <4 x half> poison, <4 x i32> zeroinitializer
@@ -185,18 +177,16 @@ define <4 x half> @vfma_vf_v4f16(<4 x half> %va, half %b, <4 x half> %vc, <4 x i
define <4 x half> @vfma_vf_v4f16_unmasked(<4 x half> %va, half %b, <4 x half> %vc, i32 zeroext %evl) {
; ZVFH-LABEL: vfma_vf_v4f16_unmasked:
; ZVFH: # %bb.0:
-; ZVFH-NEXT: vsetvli zero, a0, e16, mf2, ta, ma
+; ZVFH-NEXT: vsetivli zero, 4, e16, mf2, ta, ma
; ZVFH-NEXT: vfmadd.vf v8, fa0, v9
; ZVFH-NEXT: ret
;
; ZVFHMIN-LABEL: vfma_vf_v4f16_unmasked:
; ZVFHMIN: # %bb.0:
-; ZVFHMIN-NEXT: fmv.x.h a1, fa0
-; ZVFHMIN-NEXT: vsetvli zero, a0, e16, mf2, ta, ma
-; ZVFHMIN-NEXT: vfwcvt.f.f.v v10, v9
+; ZVFHMIN-NEXT: fmv.x.h a0, fa0
; ZVFHMIN-NEXT: vsetivli zero, 4, e16, mf2, ta, ma
-; ZVFHMIN-NEXT: vmv.v.x v9, a1
-; ZVFHMIN-NEXT: vsetvli zero, a0, e16, mf2, ta, ma
+; ZVFHMIN-NEXT: vfwcvt.f.f.v v10, v9
+; ZVFHMIN-NEXT: vmv.v.x v9, a0
; ZVFHMIN-NEXT: vfwcvt.f.f.v v11, v8
; ZVFHMIN-NEXT: vfwcvt.f.f.v v12, v9
; ZVFHMIN-NEXT: vsetvli zero, zero, e32, m1, ta, ma
@@ -213,21 +203,20 @@ define <4 x half> @vfma_vf_v4f16_unmasked(<4 x half> %va, half %b, <4 x half> %v
define <8 x half> @vfma_vv_v8f16(<8 x half> %va, <8 x half> %b, <8 x half> %c, <8 x i1> %m, i32 zeroext %evl) {
; ZVFH-LABEL: vfma_vv_v8f16:
; ZVFH: # %bb.0:
-; ZVFH-NEXT: vsetvli zero, a0, e16, m1, ta, ma
-; ZVFH-NEXT: vfmadd.vv v9, v8, v10, v0.t
-; ZVFH-NEXT: vmv.v.v v8, v9
+; ZVFH-NEXT: vsetivli zero, 8, e16, m1, ta, ma
+; ZVFH-NEXT: vfmadd.vv v8, v9, v10
; ZVFH-NEXT: ret
;
; ZVFHMIN-LABEL: vfma_vv_v8f16:
; ZVFHMIN: # %bb.0:
-; ZVFHMIN-NEXT: vsetvli zero, a0, e16, m1, ta, ma
-; ZVFHMIN-NEXT: vfwcvt.f.f.v v12, v10, v0.t
-; ZVFHMIN-NEXT: vfwcvt.f.f.v v10, v8, v0.t
-; ZVFHMIN-NEXT: vfwcvt.f.f.v v14, v9, v0.t
+; ZVFHMIN-NEXT: vsetivli zero, 8, e16, m1, ta, ma
+; ZVFHMIN-NEXT: vfwcvt.f.f.v v12, v10
+; ZVFHMIN-NEXT: vfwcvt.f.f.v v10, v8
+; ZVFHMIN-NEXT: vfwcvt.f.f.v v14, v9
; ZVFHMIN-NEXT: vsetvli zero, zero, e32, m2, ta, ma
-; ZVFHMIN-NEXT: vfmadd.vv v14, v10, v12, v0.t
+; ZVFHMIN-NEXT: vfmadd.vv v14, v10, v12
; ZVFHMIN-NEXT: vsetvli zero, zero, e16, m1, ta, ma
-; ZVFHMIN-NEXT: vfncvt.f.f.w v8, v14, v0.t
+; ZVFHMIN-NEXT: vfncvt.f.f.w v8, v14
; ZVFHMIN-NEXT: ret
%v = call <8 x half> @llvm.vp.fma.v8f16(<8 x half> %va, <8 x half> %b, <8 x half> %c, <8 x i1> %m, i32 %evl)
ret <8 x half> %v
@@ -236,13 +225,13 @@ define <8 x half> @vfma_vv_v8f16(<8 x half> %va, <8 x half> %b, <8 x half> %c, <
define <8 x half> @vfma_vv_v8f16_unmasked(<8 x half> %va, <8 x half> %b, <8 x half> %c, i32 zeroext %evl) {
; ZVFH-LABEL: vfma_vv_v8f16_unmasked:
; ZVFH: # %bb.0:
-; ZVFH-NEXT: vsetvli zero, a0, e16, m1, ta, ma
+; ZVFH-NEXT: vsetivli zero, 8, e16, m1, ta, ma
; ZVFH-NEXT: vfmadd.vv v8, v9, v10
; ZVFH-NEXT: ret
;
; ZVFHMIN-LABEL: vfma_vv_v8f16_unmasked:
; ZVFHMIN: # %bb.0:
-; ZVFHMIN-NEXT: vsetvli zero, a0, e16, m1, ta, ma
+; ZVFHMIN-NEXT: vsetivli zero, 8, e16, m1, ta, ma
; ZVFHMIN-NEXT: vfwcvt.f.f.v v12, v10
; ZVFHMIN-NEXT: vfwcvt.f.f.v v10, v8
; ZVFHMIN-NEXT: vfwcvt.f.f.v v14, v9
@@ -258,24 +247,22 @@ define <8 x half> @vfma_vv_v8f16_unmasked(<8 x half> %va, <8 x half> %b, <8 x ha
define <8 x half> @vfma_vf_v8f16(<8 x half> %va, half %...
[truncated]
|
Part of the work to remove trivial VP intrinsics from the RISC-V backend, see https://discourse.llvm.org/t/rfc-remove-codegen-support-for-trivial-vp-intrinsics-in-the-risc-v-backend/87999 This splits off 2 intrinsics from llvm#179622. The remaining sign bit intrinsic vp_fneg is expanded in llvm#190589 since other tests rely on it
Part of the work to remove trivial VP intrinsics from the RISC-V backend, see https://discourse.llvm.org/t/rfc-remove-codegen-support-for-trivial-vp-intrinsics-in-the-risc-v-backend/87999
This PR expands four intrinsics before codegen, but doesn't remove the codegen handling yet as both DAGCombiner and type legalization can create these nodes.
vp.fneg and vp.fpext are expanded in lockstep with the fma/fmuladd intrinsics since some test cases for vfmacc etc. also use these intrinsics, and mixing dynamic and constant vls causes some of the more complex patterns to be missed.
The fixed-length VP vfmacc, vfmsac, vfnmacc and vfnmsac tests also need to replace the EVL of the vp.merge/vp.select with an immediate otherwise the resulting vmerge.vvm can't be folded into them. This only happens for fixed vector intrinsics with no passthru, since we end up with a constant vl from the fixed vector and dynamic vl from the vp.merge that prevents folding.
As far as I'm aware, we don't emit fixed length vp.merges, since we only emit vp.merge in the loop vectorizer, and we only use it with EVL tail folding which requires a scalable VF.