-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SystemZ] Add support for half (fp16) #109164
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-llvm-selectiondag @llvm/pr-subscribers-clang Author: Jonas Paulsson (JonPsson1) ChangesMake sure that fp16<=>float conversions are expanded to libcalls and that 16-bit fp values can be loaded and stored properly via GPRs. With this patch the Half IR Type used in operations should be handled correctly with the help of pre-existing ISD node expansions. Patch in progress... Notes:
Full diff: https://github.com/llvm/llvm-project/pull/109164.diff 3 Files Affected:
diff --git a/clang/lib/Basic/Targets/SystemZ.h b/clang/lib/Basic/Targets/SystemZ.h
index f05ea473017bec..6566b63d4587ee 100644
--- a/clang/lib/Basic/Targets/SystemZ.h
+++ b/clang/lib/Basic/Targets/SystemZ.h
@@ -91,11 +91,20 @@ class LLVM_LIBRARY_VISIBILITY SystemZTargetInfo : public TargetInfo {
"-v128:64-a:8:16-n32:64");
}
MaxAtomicPromoteWidth = MaxAtomicInlineWidth = 128;
+
+ HasLegalHalfType = false; // Default=false
+ HalfArgsAndReturns = false; // Default=false
+ HasFloat16 = true; // Default=false
+
HasStrictFP = true;
}
unsigned getMinGlobalAlign(uint64_t Size, bool HasNonWeakDef) const override;
+ bool useFP16ConversionIntrinsics() const override {
+ return false;
+ }
+
void getTargetDefines(const LangOptions &Opts,
MacroBuilder &Builder) const override;
diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
index 582a8c139b2937..fd3dcebba1eca7 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
@@ -704,6 +704,13 @@ SystemZTargetLowering::SystemZTargetLowering(const TargetMachine &TM,
setOperationAction(ISD::BITCAST, MVT::f32, Custom);
}
+ // Expand FP16 <=> FP32 conversions to libcalls and handle FP16 loads and
+ // stores in GPRs.
+ setOperationAction(ISD::FP16_TO_FP, MVT::f32, Expand);
+ setOperationAction(ISD::FP_TO_FP16, MVT::f32, Expand);
+ setLoadExtAction(ISD::EXTLOAD, MVT::f32, MVT::f16, Expand);
+ setTruncStoreAction(MVT::f32, MVT::f16, Expand);
+
// VASTART and VACOPY need to deal with the SystemZ-specific varargs
// structure, but VAEND is a no-op.
setOperationAction(ISD::VASTART, MVT::Other, Custom);
diff --git a/llvm/test/CodeGen/SystemZ/fp-half.ll b/llvm/test/CodeGen/SystemZ/fp-half.ll
new file mode 100644
index 00000000000000..393ba2f620ff6e
--- /dev/null
+++ b/llvm/test/CodeGen/SystemZ/fp-half.ll
@@ -0,0 +1,100 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z10 | FileCheck %s
+;
+; Tests for FP16 (Half).
+
+; A function where everything is done in Half.
+define void @fun0(ptr %Op0, ptr %Op1, ptr %Dst) {
+; CHECK-LABEL: fun0:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: stmg %r12, %r15, 96(%r15)
+; CHECK-NEXT: .cfi_offset %r12, -64
+; CHECK-NEXT: .cfi_offset %r13, -56
+; CHECK-NEXT: .cfi_offset %r14, -48
+; CHECK-NEXT: .cfi_offset %r15, -40
+; CHECK-NEXT: aghi %r15, -168
+; CHECK-NEXT: .cfi_def_cfa_offset 328
+; CHECK-NEXT: std %f8, 160(%r15) # 8-byte Folded Spill
+; CHECK-NEXT: .cfi_offset %f8, -168
+; CHECK-NEXT: llgh %r2, 0(%r2)
+; CHECK-NEXT: lgr %r13, %r4
+; CHECK-NEXT: lgr %r12, %r3
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: llgh %r2, 0(%r12)
+; CHECK-NEXT: ler %f8, %f0
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: aebr %f0, %f8
+; CHECK-NEXT: brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT: sth %r2, 0(%r13)
+; CHECK-NEXT: ld %f8, 160(%r15) # 8-byte Folded Reload
+; CHECK-NEXT: lmg %r12, %r15, 264(%r15)
+; CHECK-NEXT: br %r14
+entry:
+ %0 = load half, ptr %Op0, align 2
+ %1 = load half, ptr %Op1, align 2
+ %add = fadd half %0, %1
+ store half %add, ptr %Dst, align 2
+ ret void
+}
+
+; A function where Half values are loaded and extended to float and then
+; operated on.
+define void @fun1(ptr %Op0, ptr %Op1, ptr %Dst) {
+; CHECK-LABEL: fun1:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: stmg %r12, %r15, 96(%r15)
+; CHECK-NEXT: .cfi_offset %r12, -64
+; CHECK-NEXT: .cfi_offset %r13, -56
+; CHECK-NEXT: .cfi_offset %r14, -48
+; CHECK-NEXT: .cfi_offset %r15, -40
+; CHECK-NEXT: aghi %r15, -168
+; CHECK-NEXT: .cfi_def_cfa_offset 328
+; CHECK-NEXT: std %f8, 160(%r15) # 8-byte Folded Spill
+; CHECK-NEXT: .cfi_offset %f8, -168
+; CHECK-NEXT: llgh %r2, 0(%r2)
+; CHECK-NEXT: lgr %r13, %r4
+; CHECK-NEXT: lgr %r12, %r3
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: llgh %r2, 0(%r12)
+; CHECK-NEXT: ler %f8, %f0
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: aebr %f0, %f8
+; CHECK-NEXT: brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT: sth %r2, 0(%r13)
+; CHECK-NEXT: ld %f8, 160(%r15) # 8-byte Folded Reload
+; CHECK-NEXT: lmg %r12, %r15, 264(%r15)
+; CHECK-NEXT: br %r14
+entry:
+ %0 = load half, ptr %Op0, align 2
+ %ext = fpext half %0 to float
+ %1 = load half, ptr %Op1, align 2
+ %ext1 = fpext half %1 to float
+ %add = fadd float %ext, %ext1
+ %res = fptrunc float %add to half
+ store half %res, ptr %Dst, align 2
+ ret void
+}
+
+; Test case with a Half incoming argument.
+define zeroext i1 @fun2(half noundef %f) {
+; CHECK-LABEL: fun2:
+; CHECK: # %bb.0: # %start
+; CHECK-NEXT: stmg %r14, %r15, 112(%r15)
+; CHECK-NEXT: .cfi_offset %r14, -48
+; CHECK-NEXT: .cfi_offset %r15, -40
+; CHECK-NEXT: aghi %r15, -160
+; CHECK-NEXT: .cfi_def_cfa_offset 320
+; CHECK-NEXT: brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: larl %r1, .LCPI2_0
+; CHECK-NEXT: deb %f0, 0(%r1)
+; CHECK-NEXT: brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT: risbg %r2, %r2, 63, 191, 49
+; CHECK-NEXT: lmg %r14, %r15, 272(%r15)
+; CHECK-NEXT: br %r14
+start:
+ %self = fdiv half %f, 0xHC700
+ %_4 = bitcast half %self to i16
+ %_0 = icmp slt i16 %_4, 0
+ ret i1 %_0
+}
|
@llvm/pr-subscribers-backend-systemz Author: Jonas Paulsson (JonPsson1) ChangesMake sure that fp16<=>float conversions are expanded to libcalls and that 16-bit fp values can be loaded and stored properly via GPRs. With this patch the Half IR Type used in operations should be handled correctly with the help of pre-existing ISD node expansions. Patch in progress... Notes:
Full diff: https://github.com/llvm/llvm-project/pull/109164.diff 3 Files Affected:
diff --git a/clang/lib/Basic/Targets/SystemZ.h b/clang/lib/Basic/Targets/SystemZ.h
index f05ea473017bec..6566b63d4587ee 100644
--- a/clang/lib/Basic/Targets/SystemZ.h
+++ b/clang/lib/Basic/Targets/SystemZ.h
@@ -91,11 +91,20 @@ class LLVM_LIBRARY_VISIBILITY SystemZTargetInfo : public TargetInfo {
"-v128:64-a:8:16-n32:64");
}
MaxAtomicPromoteWidth = MaxAtomicInlineWidth = 128;
+
+ HasLegalHalfType = false; // Default=false
+ HalfArgsAndReturns = false; // Default=false
+ HasFloat16 = true; // Default=false
+
HasStrictFP = true;
}
unsigned getMinGlobalAlign(uint64_t Size, bool HasNonWeakDef) const override;
+ bool useFP16ConversionIntrinsics() const override {
+ return false;
+ }
+
void getTargetDefines(const LangOptions &Opts,
MacroBuilder &Builder) const override;
diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
index 582a8c139b2937..fd3dcebba1eca7 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
@@ -704,6 +704,13 @@ SystemZTargetLowering::SystemZTargetLowering(const TargetMachine &TM,
setOperationAction(ISD::BITCAST, MVT::f32, Custom);
}
+ // Expand FP16 <=> FP32 conversions to libcalls and handle FP16 loads and
+ // stores in GPRs.
+ setOperationAction(ISD::FP16_TO_FP, MVT::f32, Expand);
+ setOperationAction(ISD::FP_TO_FP16, MVT::f32, Expand);
+ setLoadExtAction(ISD::EXTLOAD, MVT::f32, MVT::f16, Expand);
+ setTruncStoreAction(MVT::f32, MVT::f16, Expand);
+
// VASTART and VACOPY need to deal with the SystemZ-specific varargs
// structure, but VAEND is a no-op.
setOperationAction(ISD::VASTART, MVT::Other, Custom);
diff --git a/llvm/test/CodeGen/SystemZ/fp-half.ll b/llvm/test/CodeGen/SystemZ/fp-half.ll
new file mode 100644
index 00000000000000..393ba2f620ff6e
--- /dev/null
+++ b/llvm/test/CodeGen/SystemZ/fp-half.ll
@@ -0,0 +1,100 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z10 | FileCheck %s
+;
+; Tests for FP16 (Half).
+
+; A function where everything is done in Half.
+define void @fun0(ptr %Op0, ptr %Op1, ptr %Dst) {
+; CHECK-LABEL: fun0:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: stmg %r12, %r15, 96(%r15)
+; CHECK-NEXT: .cfi_offset %r12, -64
+; CHECK-NEXT: .cfi_offset %r13, -56
+; CHECK-NEXT: .cfi_offset %r14, -48
+; CHECK-NEXT: .cfi_offset %r15, -40
+; CHECK-NEXT: aghi %r15, -168
+; CHECK-NEXT: .cfi_def_cfa_offset 328
+; CHECK-NEXT: std %f8, 160(%r15) # 8-byte Folded Spill
+; CHECK-NEXT: .cfi_offset %f8, -168
+; CHECK-NEXT: llgh %r2, 0(%r2)
+; CHECK-NEXT: lgr %r13, %r4
+; CHECK-NEXT: lgr %r12, %r3
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: llgh %r2, 0(%r12)
+; CHECK-NEXT: ler %f8, %f0
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: aebr %f0, %f8
+; CHECK-NEXT: brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT: sth %r2, 0(%r13)
+; CHECK-NEXT: ld %f8, 160(%r15) # 8-byte Folded Reload
+; CHECK-NEXT: lmg %r12, %r15, 264(%r15)
+; CHECK-NEXT: br %r14
+entry:
+ %0 = load half, ptr %Op0, align 2
+ %1 = load half, ptr %Op1, align 2
+ %add = fadd half %0, %1
+ store half %add, ptr %Dst, align 2
+ ret void
+}
+
+; A function where Half values are loaded and extended to float and then
+; operated on.
+define void @fun1(ptr %Op0, ptr %Op1, ptr %Dst) {
+; CHECK-LABEL: fun1:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: stmg %r12, %r15, 96(%r15)
+; CHECK-NEXT: .cfi_offset %r12, -64
+; CHECK-NEXT: .cfi_offset %r13, -56
+; CHECK-NEXT: .cfi_offset %r14, -48
+; CHECK-NEXT: .cfi_offset %r15, -40
+; CHECK-NEXT: aghi %r15, -168
+; CHECK-NEXT: .cfi_def_cfa_offset 328
+; CHECK-NEXT: std %f8, 160(%r15) # 8-byte Folded Spill
+; CHECK-NEXT: .cfi_offset %f8, -168
+; CHECK-NEXT: llgh %r2, 0(%r2)
+; CHECK-NEXT: lgr %r13, %r4
+; CHECK-NEXT: lgr %r12, %r3
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: llgh %r2, 0(%r12)
+; CHECK-NEXT: ler %f8, %f0
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: aebr %f0, %f8
+; CHECK-NEXT: brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT: sth %r2, 0(%r13)
+; CHECK-NEXT: ld %f8, 160(%r15) # 8-byte Folded Reload
+; CHECK-NEXT: lmg %r12, %r15, 264(%r15)
+; CHECK-NEXT: br %r14
+entry:
+ %0 = load half, ptr %Op0, align 2
+ %ext = fpext half %0 to float
+ %1 = load half, ptr %Op1, align 2
+ %ext1 = fpext half %1 to float
+ %add = fadd float %ext, %ext1
+ %res = fptrunc float %add to half
+ store half %res, ptr %Dst, align 2
+ ret void
+}
+
+; Test case with a Half incoming argument.
+define zeroext i1 @fun2(half noundef %f) {
+; CHECK-LABEL: fun2:
+; CHECK: # %bb.0: # %start
+; CHECK-NEXT: stmg %r14, %r15, 112(%r15)
+; CHECK-NEXT: .cfi_offset %r14, -48
+; CHECK-NEXT: .cfi_offset %r15, -40
+; CHECK-NEXT: aghi %r15, -160
+; CHECK-NEXT: .cfi_def_cfa_offset 320
+; CHECK-NEXT: brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: larl %r1, .LCPI2_0
+; CHECK-NEXT: deb %f0, 0(%r1)
+; CHECK-NEXT: brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT: risbg %r2, %r2, 63, 191, 49
+; CHECK-NEXT: lmg %r14, %r15, 272(%r15)
+; CHECK-NEXT: br %r14
+start:
+ %self = fdiv half %f, 0xHC700
+ %_4 = bitcast half %self to i16
+ %_0 = icmp slt i16 %_4, 0
+ ret i1 %_0
+}
|
You can test this locally with the following command:git-clang-format --diff HEAD~1 HEAD --extensions cpp,c,h -- clang/test/CodeGen/SystemZ/Float16.c clang/test/CodeGen/SystemZ/fp16.c compiler-rt/lib/builtins/extendhfdf2.c compiler-rt/test/builtins/Unit/extendhfdf2_test.c clang/include/clang/Basic/TargetInfo.h clang/lib/Basic/Targets/SystemZ.h clang/lib/CodeGen/Targets/SystemZ.cpp clang/test/CodeGen/SystemZ/strictfp_builtins.c clang/test/CodeGen/SystemZ/systemz-abi.c clang/test/CodeGen/SystemZ/systemz-inline-asm.c compiler-rt/lib/builtins/clear_cache.c llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp llvm/lib/Target/SystemZ/MCTargetDesc/SystemZMCTargetDesc.cpp llvm/lib/Target/SystemZ/MCTargetDesc/SystemZMCTargetDesc.h llvm/lib/Target/SystemZ/SystemZAsmPrinter.cpp llvm/lib/Target/SystemZ/SystemZISelDAGToDAG.cpp llvm/lib/Target/SystemZ/SystemZISelLowering.cpp llvm/lib/Target/SystemZ/SystemZISelLowering.h llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp llvm/lib/Target/SystemZ/SystemZRegisterInfo.cpp llvm/lib/Target/SystemZ/SystemZRegisterInfo.h View the diff from clang-format here.diff --git a/compiler-rt/test/builtins/Unit/extendhfdf2_test.c b/compiler-rt/test/builtins/Unit/extendhfdf2_test.c
index 422e272c1..bf33291d8 100644
--- a/compiler-rt/test/builtins/Unit/extendhfdf2_test.c
+++ b/compiler-rt/test/builtins/Unit/extendhfdf2_test.c
@@ -7,81 +7,63 @@
double __extendhfdf2(TYPE_FP16 a);
-int test__extendhfdf2(TYPE_FP16 a, uint64_t expected)
-{
- double x = __extendhfdf2(a);
- int ret = compareResultD(x, expected);
+int test__extendhfdf2(TYPE_FP16 a, uint64_t expected) {
+ double x = __extendhfdf2(a);
+ int ret = compareResultD(x, expected);
- if (ret){
- printf("error in test__extendhfdf2(%#.4x) = %f, "
- "expected %f\n", toRep16(a), x, fromRep64(expected));
- }
- return ret;
+ if (ret) {
+ printf("error in test__extendhfdf2(%#.4x) = %f, "
+ "expected %f\n",
+ toRep16(a), x, fromRep64(expected));
+ }
+ return ret;
}
char assumption_1[sizeof(TYPE_FP16) * CHAR_BIT == 16] = {0};
-int main()
-{
- // qNaN
- if (test__extendhfdf2(makeQNaN16(),
- UINT64_C(0x7ff8000000000000)))
- return 1;
- // NaN
- if (test__extendhfdf2(fromRep16(0x7f80),
- UINT64_C(0x7ffe000000000000)))
- return 1;
- // inf
- if (test__extendhfdf2(makeInf16(),
- UINT64_C(0x7ff0000000000000)))
- return 1;
- // -inf
- if (test__extendhfdf2(makeNegativeInf16(),
- UINT64_C(0xfff0000000000000)))
- return 1;
- // zero
- if (test__extendhfdf2(fromRep16(0x0),
- UINT64_C(0x0)))
- return 1;
- // -zero
- if (test__extendhfdf2(fromRep16(0x8000),
- UINT64_C(0x8000000000000000)))
- return 1;
- if (test__extendhfdf2(fromRep16(0x4248),
- UINT64_C(0x4009200000000000)))
- return 1;
- if (test__extendhfdf2(fromRep16(0xc248),
- UINT64_C(0xc009200000000000)))
- return 1;
- if (test__extendhfdf2(fromRep16(0x6e62),
- UINT64_C(0x40b9880000000000)))
- return 1;
- if (test__extendhfdf2(fromRep16(0x3c00),
- UINT64_C(0x3ff0000000000000)))
- return 1;
- if (test__extendhfdf2(fromRep16(0x0400),
- UINT64_C(0x3f10000000000000)))
- return 1;
- // denormal
- if (test__extendhfdf2(fromRep16(0x0010),
- UINT64_C(0x3eb0000000000000)))
- return 1;
- if (test__extendhfdf2(fromRep16(0x0001),
- UINT64_C(0x3e70000000000000)))
- return 1;
- if (test__extendhfdf2(fromRep16(0x8001),
- UINT64_C(0xbe70000000000000)))
- return 1;
- if (test__extendhfdf2(fromRep16(0x0001),
- UINT64_C(0x3e70000000000000)))
- return 1;
- // max (precise)
- if (test__extendhfdf2(fromRep16(0x7bff),
- UINT64_C(0x40effc0000000000)))
- return 1;
- // max (rounded)
- if (test__extendhfdf2(fromRep16(0x7bff),
- UINT64_C(0x40effc0000000000)))
- return 1;
- return 0;
+int main() {
+ // qNaN
+ if (test__extendhfdf2(makeQNaN16(), UINT64_C(0x7ff8000000000000)))
+ return 1;
+ // NaN
+ if (test__extendhfdf2(fromRep16(0x7f80), UINT64_C(0x7ffe000000000000)))
+ return 1;
+ // inf
+ if (test__extendhfdf2(makeInf16(), UINT64_C(0x7ff0000000000000)))
+ return 1;
+ // -inf
+ if (test__extendhfdf2(makeNegativeInf16(), UINT64_C(0xfff0000000000000)))
+ return 1;
+ // zero
+ if (test__extendhfdf2(fromRep16(0x0), UINT64_C(0x0)))
+ return 1;
+ // -zero
+ if (test__extendhfdf2(fromRep16(0x8000), UINT64_C(0x8000000000000000)))
+ return 1;
+ if (test__extendhfdf2(fromRep16(0x4248), UINT64_C(0x4009200000000000)))
+ return 1;
+ if (test__extendhfdf2(fromRep16(0xc248), UINT64_C(0xc009200000000000)))
+ return 1;
+ if (test__extendhfdf2(fromRep16(0x6e62), UINT64_C(0x40b9880000000000)))
+ return 1;
+ if (test__extendhfdf2(fromRep16(0x3c00), UINT64_C(0x3ff0000000000000)))
+ return 1;
+ if (test__extendhfdf2(fromRep16(0x0400), UINT64_C(0x3f10000000000000)))
+ return 1;
+ // denormal
+ if (test__extendhfdf2(fromRep16(0x0010), UINT64_C(0x3eb0000000000000)))
+ return 1;
+ if (test__extendhfdf2(fromRep16(0x0001), UINT64_C(0x3e70000000000000)))
+ return 1;
+ if (test__extendhfdf2(fromRep16(0x8001), UINT64_C(0xbe70000000000000)))
+ return 1;
+ if (test__extendhfdf2(fromRep16(0x0001), UINT64_C(0x3e70000000000000)))
+ return 1;
+ // max (precise)
+ if (test__extendhfdf2(fromRep16(0x7bff), UINT64_C(0x40effc0000000000)))
+ return 1;
+ // max (rounded)
+ if (test__extendhfdf2(fromRep16(0x7bff), UINT64_C(0x40effc0000000000)))
+ return 1;
+ return 0;
}
diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
index bed993b19..14ef29f8a 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
@@ -229,8 +229,8 @@ SystemZTargetLowering::SystemZTargetLowering(const TargetMachine &TM,
// The fp<=>i32/i64 conversions are all Legal except for f16 and for
// unsigned on z10 (only z196 and above have native support for
// unsigned conversions).
- for (auto Op : {ISD::FP_TO_SINT, ISD::STRICT_FP_TO_SINT,
- ISD::SINT_TO_FP, ISD::STRICT_SINT_TO_FP})
+ for (auto Op : {ISD::FP_TO_SINT, ISD::STRICT_FP_TO_SINT, ISD::SINT_TO_FP,
+ ISD::STRICT_SINT_TO_FP})
setOperationAction(Op, VT, Custom);
for (auto Op : {ISD::FP_TO_UINT, ISD::STRICT_FP_TO_UINT})
setOperationAction(Op, VT, Custom);
|
Note that you need to also have softPromoteHalfType return true to get correct legalization for half operations. |
Thanks for pointing that out - patch updated. |
I think we should define and implement a proper ABI for the half type as well. |
Patch updated after some progress... With this version, the fp16 values are passed to conversion functions as integer, which seems to be the default. It is however a bit tricky to do this and at the same time pass half values in FP registers. At this point I wonder for one thing if it would be better to pass FP16 values to the conversion functions as _Float16 instead? It seems this may be possible to change in the configurations by looking at COMPILER_RT_HAS_FLOAT16 / compiler-rt/lib/builtins/extendhfsf2.c / fp_extend.h... Not really sure if those conversion functions are supposed to be built and only used for soft-promotion of fp16, or if there are any external implications, for instance gcc compatability. Any other comments also welcome... |
My understanding is that in GCC's From your first two sentences it sounds like @uweigand mentioned figuring out an ABI for A quick check seems to show that GCC 13 does not support Note that there are some common issues with these conversions, would probably be good to test against them if possible #97981 #97975. |
From what I can see in the libgcc sources, I never see
Yes, we're working on that. What we're planning to do is to have
Yes, we'd have to add those. I don't think we want
Thanks for pointing this out! |
I think this is accurate, libgcc just appears to (reasonably) not provide any f16-related symbols on platforms where GCC doesn't support For that reason we just always provide the symbols in rust's compiler-builtins (though we let LLVM figure out that
That is great news, especially considering how problematic the target-feature-dependent ABI on x86-32 has been. |
Patch reworked:
(twoaddr-kill.mir test updated as the hard-coded register class enum value for GRH32BitRegClass has changed.) Still some more points to go over, but it seems to be working fairly well at this point.
|
Patch improved further:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a full review, but some general comments inline.
setOperationAction(ISD::FCOS, VT, Expand); | ||
setOperationAction(ISD::FSINCOS, VT, Expand); | ||
setOperationAction(ISD::FREM, VT, Expand); | ||
setOperationAction(ISD::FPOW, VT, Expand); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't these be Promote just like all the other f16 operations? Expand triggers a libcall, which doesn't match the excess-precision setting - also, we actually don't have f16 libcalls in libm ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, if there are no f16 libcalls it works to have them be promoted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just crosslinking that there is an effort to add f16 libcalls #95250 but I have no clue what the plan is as far as lowering to them.
return __extendXfYf2__(a); | ||
} | ||
|
||
COMPILER_RT_ABI float __gnu_h2d_ieee(src_t a) { return __extendhfdf2(a); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you should define the __gnu_h2d_ieee
or the __aeabi_h2d
names here - those are not defined by the ARM standard or by other compilers today. This function should only be present under the __extendhfdf2
name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok - removed.
It seems that with a completely new build with the new cmake config the test actually passes:
I previously got confused here after rerunning cmake in the same folder as I thought it was enough to see the newly built clang being used (ninja -v) with compiler-rt. It seems maybe cmake itself got confused and used system gcc for its checks while building with clang? Anyway, it works now, sorry. I also found the problem with the miscompile (after some debugging, sigh), that the file I copied (extendhfsf.c) has float hard coded as return type, even though it defines DST_SINGLE. extendhftf2.c uses dst_t as return type, but I see other files using the type directly, so I will leave those files and extendhftf2.c as they were. I guess I have a motivating case to use dst_t and src_t everywhere, but that's another issue. So the miscompile was due to a final conversion from double to float before returning. My bad. While adding the corresponding (to extendsfdf2_test.c) test I tried inserting an obvious error, but this did not cause it to fail, not even with ninja check-all, which I thought covered all tests (ninja check-compiler-rt did not report it either). How can I run these tests? I tried building (same way as before) and running the existing extendhfsf2_test.c but got:
This was unexpected, and I get the same error with libgcc.
I tried float.exposed but I couldn't really convert an f16 hex to a double hex. Is it supposed to be able to do this? |
It should, on the |
-- A few more tests added covering Would a test for { half, half, half, half } in Swift-return.ll make sense? I think testing coverage now is fairly ok - can't think of any more instructions to handle although not all various combinations and uses of them have been duplicated from float (Test for the new extendhfdf2.cpp function is still todo). -- Manged to build and run lbm with _Float16 by simply changing the element type o LBM_Grid from double to _Float16. It ran without crashing, although 4x quicker which I guess/hope is due to some value check that stops execution somewhere. I confirmed that the executable indeed contains a lot of conversion calls and vlreph and more. -- Another try to activate the tests for the builtins, but so far no luck. In compiler-rt/test/builtins/CMakeLists.txt, the librt_has_XXX list is built, but I tried
but no failure... |
@Meinersbur @petrhosek @smeenai @danliew-apple Reaching out here to some people who seem to have been working with this CMake file: Would you have any idea how I could activate and run the compiler-rt tests for s390x (see above for my unsuccessful experiments)? Thanks. |
Thanks to @boomanaiden154 for pointing out that I need to pass -DCOMPILER_RT_BUILD_BUILTINS=ON to cmake - the tests now builds and runs. The test for half->double conversions added, with the 64-bit hex values taken from the compiler-rt conversion function results (should be correct, right :). Wanted to use the makeXXX16() functions as much as possible, but not sure what to pass to makeQNaN16(uint16_t rand), so left that as it was, which also should be fine, I suppose. |
Have nothing more to do for this now, so waiting for review. |
Experiment with soft-promotion in FP regs (not working). Try to make f16 legal instead Atomic loads/stores, spill/reload, tests for __fp16 and half vectors. strict f16 with tests. Review Make use of vector facility if present.
Use 4-byte spill size in case of no vector support. Build the generic compiler-rt sources for s390x. Don't set libcall names. @llvm.s390.tdc, fcopysign, strict_fminimum/fmaximum. More tests for f16, but not complete. libfuncs built also to double and long double.
Patch rebased. The reported formatting issue here is in the test which however was mostly copied from a pre-existing test, so maybe clang-format should be done on all these tests or not. Skipping it for now. |
Updated per review.
FP_TO_INT: Z10 unsigned: For i128 which is not a legal type so this needs to be handled similarly in LowerOperationWrapper() instead. Unfortunately it is not possible to return and select "Expand or LibCall", so for the i128 case it seems to be necessary to emit the libcall using the makeLibCall() method. Moved the comment "Promoting the result to i64...so use the default expansion" that was present in SystemZISelLowering.cpp into this method, but don't quite understand it fully. Is this talking about promoting to signed i64? INT_TO_FP: Z10 unsigned:
|
This is about whether we can (and should) implement a fp->u32 conversion via fp->s64->u32. If the input is in the valid range for an u32, this will always result in the correct output. However, the question is what happens if the input is outside that range. The z196 instruction follows the IEEE rules and will generate an invalid operation exception, and I think we intended to match that semantics with the expanded code for z10. The fp->s64->u32 expansion would not generate an invalid operation exception if the input value is inside the s64 but outside the u32 range, that's why we chose not to use it. However, now that I'm looking at it again, this decision seems questionable for at least two reasons:
Maybe we should re-think this, but then again z10 doesn't really matter anymore at this point, so it probably doesn't make much sense to change codegen now ...
Right, that makes sense (there is again the question of out-of-range inputs, but given the above discussion, I think this fine).
I see that the LowerOperationWrapper still emits i128 operations. (E.g. you expand a f16->i128 into a f16->f32 and f32->i128) But that routine is called because of the illegal input type i128, so I understand it must not leave any operations with the illegal type; rather, it should complete the expansion fully. This is annoying since for legal i128 we do the expansion in LowerOperation while for illegal i128 we do it in LowerOperationWrapper. This was the same problem for the atomics, where I avoided code duplication by implementing the expansion in LowerOperationWrapper and then simply calling that routine from LowerOperation.
I see, this is unfortunate, but seems the only option.
Why can't you still do the promotion via the Promote action for i32? Then you wouldn't have to duplicate the common-code case ...
Ah, this implementation is quite inefficient. There should be no extend/trunc libcalls needed at all: the sign bit in f16 is in the same place as f32 or f64, so the actual CPSDR instruction should simply work on f16 too. |
define half @f0(half %a, half %b) { | ||
; CHECK-LABEL: f0: | ||
; CHECK: brasl %r14, __extendhfsf2@PLT | ||
; CHECK: brasl %r14, __extendhfsf2@PLT | ||
; CHECK: cpsdr %f0, %f9, %f0 | ||
; CHECK: brasl %r14, __truncsfhf2@PLT | ||
; CHECK: br %r14 | ||
%res = call half @llvm.copysign.f16(half %a, half %b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Nonblocker) a few other architectures use an asm lowering to avoid the calls here, it's just (a & !MASK) | (Y & MASK)
with MASK = 1 << 15
. This is fine because copysign
doesn't interact with fenv including sNaN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed now by avoiding the conversions.
; CHECK-LABEL: f0: | ||
; CHECK: brasl %r14, __extendhfsf2@PLT | ||
; CHECK-NEXT: lpdfr %f0, %f0 | ||
; CHECK-NEXT: brasl %r14, __truncsfhf2@PLT | ||
; CHECK: br %r14 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly fabs
could be a & !SIGN_MASK
. It looks like aarch64 uses this, x86 still seems to extend then truncate for whatever reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if it would be worth doing a special lowering for this at this point..?
Thanks for explaining - I updated the comment just a little bit so it is - at least to me - a bit easier to follow.
My understanding is that the TypeLegalizer calls here for i128 operands/results for Custom operations, even though the target doesn't have to do anything. If Results is returned empty, TypeLegalizer will try to handle it. New nodes in Results on the other hand will be visited as well.
I followed your example of calling LowerOperationWrapper instead to avoid the code duplication - looks better.
You are right, that is unnecessary. I must have gotten carried away with "extending all f16:s", not realizing that common code actually does this in this case.
Added new opcodes for half per what was previously done for float and double. Special handling needed to remove the fpround for f128 as it would otherwise be lowered to a libcall. (The fpround for f128 could be removed for float and double as well, but there are currently no tests for this). |
That's not quite what I was thinking. You now call into LowerOperationWrapper for f16 always - but f16 is always legal so there shouldn't be a need to do that. Instead, the type that is sometimes legal and sometimes not is i128 - so I would have thought the right way to call into LowerOperationWrapper for i128 always. (For example, when i128 is not legal, where is the libcall even emitted now? LowerOperation shouldn't be called for legal types, and your current LowerOperationWrapper doesn't emit libcalls?)
This seems to be more suitable for a DAGCombiner rule as it is really a performance optimization. This could also be done as a separate patch (for all the types). |
libcalls emission: For the uint->fp, SelectionDAGLegalize::ExpandLegalINT_TO_FP() has an assertion before the last attempt involving converting to SINT_TO_FP, that makes sure that this optimization is possible. If I change that to return SDValue() instead of asserting, these conversions now get a libcall emitted by common code. However, with fp->uint we get a working expansion but with two libcalls and a branch-sequence, instead of a single libcall (e.g. fp-conv-20.ll/@f13/-z13). Since there are specialized libcalls available, it seems this wouldn't be acceptable, so keeping the
Removed the lowering that removed the fp_round of fp128, to be followed up instead along with float and double. |
Make sure that fp16<=>float conversions are expanded to libcalls and that 16-bit fp values can be loaded and stored properly via GPRs. With this patch the Half IR Type used in operations should be handled correctly with the help of pre-existing ISD node expansions.
Patch in progress...
Fixes #50374