-
Notifications
You must be signed in to change notification settings - Fork 15.8k
[PowerPC] Change half to use soft promotion rather than PromoteFloat
#152632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The second commit is the interesting part here. The first commit is in a standalone PR at #152625 and should land first. I am referencing https://ftp.rtems.org/pub/rtems/people/sebh/ABI64BitOpenPOWERv1.1_16July2015_pub.pdf for the ABI not having There is a new crash expanding LRINT that I am trying to figure out. |
The latest specification of the ELFv2 ABI can be found here: https://files.openpower.foundation/s/cfA2oFPXbbZwEBK :) "Table 2.13. Decimal Floating-Point Types" does not contain |
|
Oh that's great, thank you for the link!
I think the correct place would be "Table 2.11. Scalar Types" under "Binary Floating-Point" ( |
|
I'm stuck on the new crash, thought it just needed a new |
35a405b to
77508b7
Compare
|
@chenzheng1030, @EsmeYi, @lei137, @RolandF77 could you review this? Only the last commit "[PowerPC] Change half to use soft promotion" is relevant to be reviewed for this PR. There are other commits here because it needs a few other things to land first:
I'm avoiding marking this as "ready to review" for now so it doesn't ping everybody from the backends touched in these other PRs, but the change to soft promotion here should be ready to review. |
77508b7 to
a8d24ba
Compare
|
(current failure in "ERROR: test_modulelist_deadlock" has to be unrelated) |
a8d24ba to
dcecedd
Compare
81b123e to
a636c23
Compare
On PowerPC targets, `half` uses the default legalization of promoting to a `f32`. However, this has some fundamental issues related to inability to round trip. Resolve this by switching to the soft legalization, which passes `f16` as an `i16`. The PowerPC ABI Specification does not define a `_Float16` type, so the calling convention changes are acceptable. Fixes the PowerPC portion of [1]. A similar change was done for MIPS in f0231b6 ("[MIPS] Use softPromoteHalf legalization for fp16 rather than PromoteFloat (llvm#110199)") and for Loongarch in 13280d9 ("[loongarch][DAG][FREEZE] Fix crash when FREEZE a half(f16) type on loongarch (llvm#107791)"). [1]: llvm#97975
a636c23 to
f0508b4
Compare
|
@llvm/pr-subscribers-backend-powerpc Author: Trevor Gross (tgross35) ChangesOn PowerPC targets, The PowerPC ABI Specification does not define a Fixes the PowerPC part of #97975 Patch is 291.93 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/152632.diff 13 Files Affected:
diff --git a/llvm/docs/ReleaseNotes.md b/llvm/docs/ReleaseNotes.md
index 910a50214df2f..169d4026820d8 100644
--- a/llvm/docs/ReleaseNotes.md
+++ b/llvm/docs/ReleaseNotes.md
@@ -142,6 +142,8 @@ Changes to the MIPS Backend
Changes to the PowerPC Backend
------------------------------
+* `half` now uses a soft float ABI, which works correctly in more cases.
+
Changes to the RISC-V Backend
-----------------------------
diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.h b/llvm/lib/Target/PowerPC/PPCISelLowering.h
index daae839479c3c..97546f37e6d5e 100644
--- a/llvm/lib/Target/PowerPC/PPCISelLowering.h
+++ b/llvm/lib/Target/PowerPC/PPCISelLowering.h
@@ -212,6 +212,8 @@ namespace llvm {
bool useSoftFloat() const override;
+ bool softPromoteHalfType() const override { return true; }
+
bool hasSPE() const;
MVT getScalarShiftAmountTy(const DataLayout &, EVT) const override {
diff --git a/llvm/test/CodeGen/Generic/half.ll b/llvm/test/CodeGen/Generic/half.ll
index ef7bfe2f2d9ce..f19d6da4eeb7b 100644
--- a/llvm/test/CodeGen/Generic/half.ll
+++ b/llvm/test/CodeGen/Generic/half.ll
@@ -29,9 +29,9 @@
; RUN: %if mips-registered-target %{ llc %s -o - -mtriple=mipsel-unknown-linux-gnu | FileCheck %s --check-prefixes=ALL,CHECK %}
; RUN: %if msp430-registered-target %{ llc %s -o - -mtriple=msp430-none-elf | FileCheck %s --check-prefixes=ALL,CHECK %}
; RUN: %if nvptx-registered-target %{ llc %s -o - -mtriple=nvptx64-nvidia-cuda | FileCheck %s --check-prefixes=NOCRASH %}
-; RUN: %if powerpc-registered-target %{ llc %s -o - -mtriple=powerpc-unknown-linux-gnu | FileCheck %s --check-prefixes=ALL,BAD %}
-; RUN: %if powerpc-registered-target %{ llc %s -o - -mtriple=powerpc64-unknown-linux-gnu | FileCheck %s --check-prefixes=ALL,BAD %}
-; RUN: %if powerpc-registered-target %{ llc %s -o - -mtriple=powerpc64le-unknown-linux-gnu | FileCheck %s --check-prefixes=ALL,BAD %}
+; RUN: %if powerpc-registered-target %{ llc %s -o - -mtriple=powerpc-unknown-linux-gnu | FileCheck %s --check-prefixes=ALL,CHECK %}
+; RUN: %if powerpc-registered-target %{ llc %s -o - -mtriple=powerpc64-unknown-linux-gnu | FileCheck %s --check-prefixes=ALL,CHECK %}
+; RUN: %if powerpc-registered-target %{ llc %s -o - -mtriple=powerpc64le-unknown-linux-gnu | FileCheck %s --check-prefixes=ALL,CHECK %}
; RUN: %if riscv-registered-target %{ llc %s -o - -mtriple=riscv32-unknown-linux-gnu | FileCheck %s --check-prefixes=ALL,CHECK %}
; RUN: %if riscv-registered-target %{ llc %s -o - -mtriple=riscv64-unknown-linux-gnu | FileCheck %s --check-prefixes=ALL,CHECK %}
; RUN: %if sparc-registered-target %{ llc %s -o - -mtriple=sparc-unknown-linux-gnu | FileCheck %s --check-prefixes=ALL,CHECK %}
diff --git a/llvm/test/CodeGen/PowerPC/atomics.ll b/llvm/test/CodeGen/PowerPC/atomics.ll
index ff1a7222cc92c..54a35dab2a422 100644
--- a/llvm/test/CodeGen/PowerPC/atomics.ll
+++ b/llvm/test/CodeGen/PowerPC/atomics.ll
@@ -469,39 +469,20 @@ define i64 @and_i64_release(ptr %mem, i64 %operand) {
define half @load_atomic_f16__seq_cst(ptr %ptr) {
; PPC32-LABEL: load_atomic_f16__seq_cst:
; PPC32: # %bb.0:
-; PPC32-NEXT: mflr r0
-; PPC32-NEXT: stwu r1, -16(r1)
-; PPC32-NEXT: stw r0, 20(r1)
-; PPC32-NEXT: .cfi_def_cfa_offset 16
-; PPC32-NEXT: .cfi_offset lr, 4
; PPC32-NEXT: sync
; PPC32-NEXT: lhz r3, 0(r3)
; PPC32-NEXT: cmpw cr7, r3, r3
; PPC32-NEXT: bne- cr7, .+4
; PPC32-NEXT: isync
-; PPC32-NEXT: bl __extendhfsf2
-; PPC32-NEXT: lwz r0, 20(r1)
-; PPC32-NEXT: addi r1, r1, 16
-; PPC32-NEXT: mtlr r0
; PPC32-NEXT: blr
;
; PPC64-LABEL: load_atomic_f16__seq_cst:
; PPC64: # %bb.0:
-; PPC64-NEXT: mflr r0
-; PPC64-NEXT: stdu r1, -112(r1)
-; PPC64-NEXT: std r0, 128(r1)
-; PPC64-NEXT: .cfi_def_cfa_offset 112
-; PPC64-NEXT: .cfi_offset lr, 16
; PPC64-NEXT: sync
; PPC64-NEXT: lhz r3, 0(r3)
; PPC64-NEXT: cmpd cr7, r3, r3
; PPC64-NEXT: bne- cr7, .+4
; PPC64-NEXT: isync
-; PPC64-NEXT: bl __extendhfsf2
-; PPC64-NEXT: nop
-; PPC64-NEXT: addi r1, r1, 112
-; PPC64-NEXT: ld r0, 16(r1)
-; PPC64-NEXT: mtlr r0
; PPC64-NEXT: blr
%val = load atomic half, ptr %ptr seq_cst, align 2
ret half %val
@@ -575,44 +556,11 @@ define double @load_atomic_f64__seq_cst(ptr %ptr) {
}
define void @store_atomic_f16__seq_cst(ptr %ptr, half %val1) {
-; PPC32-LABEL: store_atomic_f16__seq_cst:
-; PPC32: # %bb.0:
-; PPC32-NEXT: mflr r0
-; PPC32-NEXT: stwu r1, -16(r1)
-; PPC32-NEXT: stw r0, 20(r1)
-; PPC32-NEXT: .cfi_def_cfa_offset 16
-; PPC32-NEXT: .cfi_offset lr, 4
-; PPC32-NEXT: .cfi_offset r30, -8
-; PPC32-NEXT: stw r30, 8(r1) # 4-byte Folded Spill
-; PPC32-NEXT: mr r30, r3
-; PPC32-NEXT: bl __truncsfhf2
-; PPC32-NEXT: sync
-; PPC32-NEXT: sth r3, 0(r30)
-; PPC32-NEXT: lwz r30, 8(r1) # 4-byte Folded Reload
-; PPC32-NEXT: lwz r0, 20(r1)
-; PPC32-NEXT: addi r1, r1, 16
-; PPC32-NEXT: mtlr r0
-; PPC32-NEXT: blr
-;
-; PPC64-LABEL: store_atomic_f16__seq_cst:
-; PPC64: # %bb.0:
-; PPC64-NEXT: mflr r0
-; PPC64-NEXT: stdu r1, -128(r1)
-; PPC64-NEXT: std r0, 144(r1)
-; PPC64-NEXT: .cfi_def_cfa_offset 128
-; PPC64-NEXT: .cfi_offset lr, 16
-; PPC64-NEXT: .cfi_offset r30, -16
-; PPC64-NEXT: std r30, 112(r1) # 8-byte Folded Spill
-; PPC64-NEXT: mr r30, r3
-; PPC64-NEXT: bl __truncsfhf2
-; PPC64-NEXT: nop
-; PPC64-NEXT: sync
-; PPC64-NEXT: sth r3, 0(r30)
-; PPC64-NEXT: ld r30, 112(r1) # 8-byte Folded Reload
-; PPC64-NEXT: addi r1, r1, 128
-; PPC64-NEXT: ld r0, 16(r1)
-; PPC64-NEXT: mtlr r0
-; PPC64-NEXT: blr
+; CHECK-LABEL: store_atomic_f16__seq_cst:
+; CHECK: # %bb.0:
+; CHECK-NEXT: sync
+; CHECK-NEXT: sth r4, 0(r3)
+; CHECK-NEXT: blr
store atomic half %val1, ptr %ptr seq_cst, align 2
ret void
}
diff --git a/llvm/test/CodeGen/PowerPC/f128-conv.ll b/llvm/test/CodeGen/PowerPC/f128-conv.ll
index f8b2861156db4..080843217e8c9 100644
--- a/llvm/test/CodeGen/PowerPC/f128-conv.ll
+++ b/llvm/test/CodeGen/PowerPC/f128-conv.ll
@@ -1349,9 +1349,6 @@ define half @trunc(fp128 %a) nounwind {
; CHECK-NEXT: std r0, 48(r1)
; CHECK-NEXT: bl __trunckfhf2
; CHECK-NEXT: nop
-; CHECK-NEXT: clrlwi r3, r3, 16
-; CHECK-NEXT: mtfprwz f0, r3
-; CHECK-NEXT: xscvhpdp f1, f0
; CHECK-NEXT: addi r1, r1, 32
; CHECK-NEXT: ld r0, 16(r1)
; CHECK-NEXT: mtlr r0
@@ -1364,9 +1361,6 @@ define half @trunc(fp128 %a) nounwind {
; CHECK-P8-NEXT: std r0, 48(r1)
; CHECK-P8-NEXT: bl __trunckfhf2
; CHECK-P8-NEXT: nop
-; CHECK-P8-NEXT: clrldi r3, r3, 48
-; CHECK-P8-NEXT: bl __extendhfsf2
-; CHECK-P8-NEXT: nop
; CHECK-P8-NEXT: addi r1, r1, 32
; CHECK-P8-NEXT: ld r0, 16(r1)
; CHECK-P8-NEXT: mtlr r0
@@ -1379,7 +1373,9 @@ entry:
define fp128 @ext(half %a) nounwind {
; CHECK-LABEL: ext:
; CHECK: # %bb.0: # %entry
-; CHECK-NEXT: xscpsgndp v2, f1, f1
+; CHECK-NEXT: clrlwi r3, r3, 16
+; CHECK-NEXT: mtfprwz f0, r3
+; CHECK-NEXT: xscvhpdp v2, f0
; CHECK-NEXT: xscvdpqp v2, v2
; CHECK-NEXT: blr
;
@@ -1387,7 +1383,10 @@ define fp128 @ext(half %a) nounwind {
; CHECK-P8: # %bb.0: # %entry
; CHECK-P8-NEXT: mflr r0
; CHECK-P8-NEXT: stdu r1, -32(r1)
+; CHECK-P8-NEXT: clrldi r3, r3, 48
; CHECK-P8-NEXT: std r0, 48(r1)
+; CHECK-P8-NEXT: bl __extendhfsf2
+; CHECK-P8-NEXT: nop
; CHECK-P8-NEXT: bl __extendsfkf2
; CHECK-P8-NEXT: nop
; CHECK-P8-NEXT: addi r1, r1, 32
diff --git a/llvm/test/CodeGen/PowerPC/half.ll b/llvm/test/CodeGen/PowerPC/half.ll
index 903ea691ae6ba..6eaac1d7fc5c1 100644
--- a/llvm/test/CodeGen/PowerPC/half.ll
+++ b/llvm/test/CodeGen/PowerPC/half.ll
@@ -21,40 +21,13 @@
define void @store(half %x, ptr %p) nounwind {
; PPC32-LABEL: store:
; PPC32: # %bb.0:
-; PPC32-NEXT: mflr r0
-; PPC32-NEXT: stwu r1, -16(r1)
-; PPC32-NEXT: stw r0, 20(r1)
-; PPC32-NEXT: stw r30, 8(r1) # 4-byte Folded Spill
-; PPC32-NEXT: mr r30, r3
-; PPC32-NEXT: bl __truncsfhf2
-; PPC32-NEXT: sth r3, 0(r30)
-; PPC32-NEXT: lwz r30, 8(r1) # 4-byte Folded Reload
-; PPC32-NEXT: lwz r0, 20(r1)
-; PPC32-NEXT: addi r1, r1, 16
-; PPC32-NEXT: mtlr r0
+; PPC32-NEXT: sth r3, 0(r4)
; PPC32-NEXT: blr
;
-; P8-LABEL: store:
-; P8: # %bb.0:
-; P8-NEXT: mflr r0
-; P8-NEXT: std r30, -16(r1) # 8-byte Folded Spill
-; P8-NEXT: stdu r1, -48(r1)
-; P8-NEXT: std r0, 64(r1)
-; P8-NEXT: mr r30, r4
-; P8-NEXT: bl __truncsfhf2
-; P8-NEXT: nop
-; P8-NEXT: sth r3, 0(r30)
-; P8-NEXT: addi r1, r1, 48
-; P8-NEXT: ld r0, 16(r1)
-; P8-NEXT: ld r30, -16(r1) # 8-byte Folded Reload
-; P8-NEXT: mtlr r0
-; P8-NEXT: blr
-;
-; P9-LABEL: store:
-; P9: # %bb.0:
-; P9-NEXT: xscvdphp f0, f1
-; P9-NEXT: stxsihx f0, 0, r4
-; P9-NEXT: blr
+; CHECK-LABEL: store:
+; CHECK: # %bb.0:
+; CHECK-NEXT: sth r3, 0(r4)
+; CHECK-NEXT: blr
;
; SOFT-LABEL: store:
; SOFT: # %bb.0:
@@ -63,18 +36,7 @@ define void @store(half %x, ptr %p) nounwind {
;
; BE-LABEL: store:
; BE: # %bb.0:
-; BE-NEXT: mflr r0
-; BE-NEXT: stdu r1, -128(r1)
-; BE-NEXT: std r0, 144(r1)
-; BE-NEXT: std r30, 112(r1) # 8-byte Folded Spill
-; BE-NEXT: mr r30, r4
-; BE-NEXT: bl __truncsfhf2
-; BE-NEXT: nop
-; BE-NEXT: sth r3, 0(r30)
-; BE-NEXT: ld r30, 112(r1) # 8-byte Folded Reload
-; BE-NEXT: addi r1, r1, 128
-; BE-NEXT: ld r0, 16(r1)
-; BE-NEXT: mtlr r0
+; BE-NEXT: sth r3, 0(r4)
; BE-NEXT: blr
store half %x, ptr %p
ret void
@@ -83,34 +45,13 @@ define void @store(half %x, ptr %p) nounwind {
define half @return(ptr %p) nounwind {
; PPC32-LABEL: return:
; PPC32: # %bb.0:
-; PPC32-NEXT: mflr r0
-; PPC32-NEXT: stwu r1, -16(r1)
-; PPC32-NEXT: stw r0, 20(r1)
; PPC32-NEXT: lhz r3, 0(r3)
-; PPC32-NEXT: bl __extendhfsf2
-; PPC32-NEXT: lwz r0, 20(r1)
-; PPC32-NEXT: addi r1, r1, 16
-; PPC32-NEXT: mtlr r0
; PPC32-NEXT: blr
;
-; P8-LABEL: return:
-; P8: # %bb.0:
-; P8-NEXT: mflr r0
-; P8-NEXT: stdu r1, -32(r1)
-; P8-NEXT: std r0, 48(r1)
-; P8-NEXT: lhz r3, 0(r3)
-; P8-NEXT: bl __extendhfsf2
-; P8-NEXT: nop
-; P8-NEXT: addi r1, r1, 32
-; P8-NEXT: ld r0, 16(r1)
-; P8-NEXT: mtlr r0
-; P8-NEXT: blr
-;
-; P9-LABEL: return:
-; P9: # %bb.0:
-; P9-NEXT: lxsihzx f0, 0, r3
-; P9-NEXT: xscvhpdp f1, f0
-; P9-NEXT: blr
+; CHECK-LABEL: return:
+; CHECK: # %bb.0:
+; CHECK-NEXT: lhz r3, 0(r3)
+; CHECK-NEXT: blr
;
; SOFT-LABEL: return:
; SOFT: # %bb.0:
@@ -119,15 +60,7 @@ define half @return(ptr %p) nounwind {
;
; BE-LABEL: return:
; BE: # %bb.0:
-; BE-NEXT: mflr r0
-; BE-NEXT: stdu r1, -112(r1)
-; BE-NEXT: std r0, 128(r1)
; BE-NEXT: lhz r3, 0(r3)
-; BE-NEXT: bl __extendhfsf2
-; BE-NEXT: nop
-; BE-NEXT: addi r1, r1, 112
-; BE-NEXT: ld r0, 16(r1)
-; BE-NEXT: mtlr r0
; BE-NEXT: blr
%r = load half, ptr %p
ret half %r
@@ -317,11 +250,6 @@ define dso_local void @stored(ptr nocapture %a, double %b) local_unnamed_addr no
; SOFT-NEXT: std r0, 64(r1)
; SOFT-NEXT: bl __truncdfhf2
; SOFT-NEXT: nop
-; SOFT-NEXT: clrldi r3, r3, 48
-; SOFT-NEXT: bl __extendhfsf2
-; SOFT-NEXT: nop
-; SOFT-NEXT: bl __truncsfhf2
-; SOFT-NEXT: nop
; SOFT-NEXT: sth r3, 0(r30)
; SOFT-NEXT: addi r1, r1, 48
; SOFT-NEXT: ld r0, 16(r1)
@@ -400,11 +328,6 @@ define dso_local void @storef(ptr nocapture %a, float %b) local_unnamed_addr nou
; SOFT-NEXT: std r0, 64(r1)
; SOFT-NEXT: bl __truncsfhf2
; SOFT-NEXT: nop
-; SOFT-NEXT: clrldi r3, r3, 48
-; SOFT-NEXT: bl __extendhfsf2
-; SOFT-NEXT: nop
-; SOFT-NEXT: bl __truncsfhf2
-; SOFT-NEXT: nop
; SOFT-NEXT: sth r3, 0(r30)
; SOFT-NEXT: addi r1, r1, 48
; SOFT-NEXT: ld r0, 16(r1)
@@ -449,21 +372,8 @@ define void @test_load_store(ptr %in, ptr %out) nounwind {
;
; SOFT-LABEL: test_load_store:
; SOFT: # %bb.0:
-; SOFT-NEXT: mflr r0
-; SOFT-NEXT: std r30, -16(r1) # 8-byte Folded Spill
-; SOFT-NEXT: stdu r1, -48(r1)
-; SOFT-NEXT: std r0, 64(r1)
-; SOFT-NEXT: mr r30, r4
; SOFT-NEXT: lhz r3, 0(r3)
-; SOFT-NEXT: bl __extendhfsf2
-; SOFT-NEXT: nop
-; SOFT-NEXT: bl __truncsfhf2
-; SOFT-NEXT: nop
-; SOFT-NEXT: sth r3, 0(r30)
-; SOFT-NEXT: addi r1, r1, 48
-; SOFT-NEXT: ld r0, 16(r1)
-; SOFT-NEXT: ld r30, -16(r1) # 8-byte Folded Reload
-; SOFT-NEXT: mtlr r0
+; SOFT-NEXT: sth r3, 0(r4)
; SOFT-NEXT: blr
;
; BE-LABEL: test_load_store:
@@ -529,35 +439,11 @@ define void @test_bitcast_to_half(ptr %addr, i16 %in) nounwind {
define half @from_bits(i16 %x) nounwind {
; PPC32-LABEL: from_bits:
; PPC32: # %bb.0:
-; PPC32-NEXT: mflr r0
-; PPC32-NEXT: stwu r1, -16(r1)
-; PPC32-NEXT: clrlwi r3, r3, 16
-; PPC32-NEXT: stw r0, 20(r1)
-; PPC32-NEXT: bl __extendhfsf2
-; PPC32-NEXT: lwz r0, 20(r1)
-; PPC32-NEXT: addi r1, r1, 16
-; PPC32-NEXT: mtlr r0
; PPC32-NEXT: blr
;
-; P8-LABEL: from_bits:
-; P8: # %bb.0:
-; P8-NEXT: mflr r0
-; P8-NEXT: stdu r1, -32(r1)
-; P8-NEXT: clrldi r3, r3, 48
-; P8-NEXT: std r0, 48(r1)
-; P8-NEXT: bl __extendhfsf2
-; P8-NEXT: nop
-; P8-NEXT: addi r1, r1, 32
-; P8-NEXT: ld r0, 16(r1)
-; P8-NEXT: mtlr r0
-; P8-NEXT: blr
-;
-; P9-LABEL: from_bits:
-; P9: # %bb.0:
-; P9-NEXT: clrlwi r3, r3, 16
-; P9-NEXT: mtfprwz f0, r3
-; P9-NEXT: xscvhpdp f1, f0
-; P9-NEXT: blr
+; CHECK-LABEL: from_bits:
+; CHECK: # %bb.0:
+; CHECK-NEXT: blr
;
; SOFT-LABEL: from_bits:
; SOFT: # %bb.0:
@@ -565,15 +451,6 @@ define half @from_bits(i16 %x) nounwind {
;
; BE-LABEL: from_bits:
; BE: # %bb.0:
-; BE-NEXT: mflr r0
-; BE-NEXT: stdu r1, -112(r1)
-; BE-NEXT: clrldi r3, r3, 48
-; BE-NEXT: std r0, 128(r1)
-; BE-NEXT: bl __extendhfsf2
-; BE-NEXT: nop
-; BE-NEXT: addi r1, r1, 112
-; BE-NEXT: ld r0, 16(r1)
-; BE-NEXT: mtlr r0
; BE-NEXT: blr
%res = bitcast i16 %x to half
ret half %res
@@ -582,35 +459,11 @@ define half @from_bits(i16 %x) nounwind {
define i16 @to_bits(half %x) nounwind {
; PPC32-LABEL: to_bits:
; PPC32: # %bb.0:
-; PPC32-NEXT: mflr r0
-; PPC32-NEXT: stwu r1, -16(r1)
-; PPC32-NEXT: stw r0, 20(r1)
-; PPC32-NEXT: bl __truncsfhf2
-; PPC32-NEXT: clrlwi r3, r3, 16
-; PPC32-NEXT: lwz r0, 20(r1)
-; PPC32-NEXT: addi r1, r1, 16
-; PPC32-NEXT: mtlr r0
; PPC32-NEXT: blr
;
-; P8-LABEL: to_bits:
-; P8: # %bb.0:
-; P8-NEXT: mflr r0
-; P8-NEXT: stdu r1, -32(r1)
-; P8-NEXT: std r0, 48(r1)
-; P8-NEXT: bl __truncsfhf2
-; P8-NEXT: nop
-; P8-NEXT: clrldi r3, r3, 48
-; P8-NEXT: addi r1, r1, 32
-; P8-NEXT: ld r0, 16(r1)
-; P8-NEXT: mtlr r0
-; P8-NEXT: blr
-;
-; P9-LABEL: to_bits:
-; P9: # %bb.0:
-; P9-NEXT: xscvdphp f0, f1
-; P9-NEXT: mffprwz r3, f0
-; P9-NEXT: clrlwi r3, r3, 16
-; P9-NEXT: blr
+; CHECK-LABEL: to_bits:
+; CHECK: # %bb.0:
+; CHECK-NEXT: blr
;
; SOFT-LABEL: to_bits:
; SOFT: # %bb.0:
@@ -618,15 +471,6 @@ define i16 @to_bits(half %x) nounwind {
;
; BE-LABEL: to_bits:
; BE: # %bb.0:
-; BE-NEXT: mflr r0
-; BE-NEXT: stdu r1, -112(r1)
-; BE-NEXT: std r0, 128(r1)
-; BE-NEXT: bl __truncsfhf2
-; BE-NEXT: nop
-; BE-NEXT: clrldi r3, r3, 48
-; BE-NEXT: addi r1, r1, 112
-; BE-NEXT: ld r0, 16(r1)
-; BE-NEXT: mtlr r0
; BE-NEXT: blr
%res = bitcast half %x to i16
ret i16 %res
@@ -804,11 +648,6 @@ define void @test_trunc32(float %in, ptr %addr) nounwind {
; SOFT-NEXT: mr r30, r4
; SOFT-NEXT: bl __truncsfhf2
; SOFT-NEXT: nop
-; SOFT-NEXT: clrldi r3, r3, 48
-; SOFT-NEXT: bl __extendhfsf2
-; SOFT-NEXT: nop
-; SOFT-NEXT: bl __truncsfhf2
-; SOFT-NEXT: nop
; SOFT-NEXT: sth r3, 0(r30)
; SOFT-NEXT: addi r1, r1, 48
; SOFT-NEXT: ld r0, 16(r1)
@@ -882,11 +721,6 @@ define void @test_trunc64(double %in, ptr %addr) nounwind {
; SOFT-NEXT: mr r30, r4
; SOFT-NEXT: bl __truncdfhf2
; SOFT-NEXT: nop
-; SOFT-NEXT: clrldi r3, r3, 48
-; SOFT-NEXT: bl __extendhfsf2
-; SOFT-NEXT: nop
-; SOFT-NEXT: bl __truncsfhf2
-; SOFT-NEXT: nop
; SOFT-NEXT: sth r3, 0(r30)
; SOFT-NEXT: addi r1, r1, 48
; SOFT-NEXT: ld r0, 16(r1)
@@ -1041,11 +875,6 @@ define void @test_sitofp_i64(i64 %a, ptr %p) nounwind {
; SOFT-NEXT: clrldi r3, r3, 32
; SOFT-NEXT: bl __truncsfhf2
; SOFT-NEXT: nop
-; SOFT-NEXT: clrldi r3, r3, 48
-; SOFT-NEXT: bl __extendhfsf2
-; SOFT-NEXT: nop
-; SOFT-NEXT: bl __truncsfhf2
-; SOFT-NEXT: nop
; SOFT-NEXT: sth r3, 0(r30)
; SOFT-NEXT: addi r1, r1, 48
; SOFT-NEXT: ld r0, 16(r1)
@@ -1228,11 +1057,6 @@ define void @test_uitofp_i64(i64 %a, ptr %p) nounwind {
; SOFT-NEXT: nop
; SOFT-NEXT: bl __truncsfhf2
; SOFT-NEXT: nop
-; SOFT-NEXT: clrldi r3, r3, 48
-; SOFT-NEXT: bl __extendhfsf2
-; SOFT-NEXT: nop
-; SOFT-NEXT: bl __truncsfhf2
-; SOFT-NEXT: nop
; SOFT-NEXT: sth r3, 0(r30)
; SOFT-NEXT: addi r1, r1, 48
; SOFT-NEXT: ld r0, 16(r1)
@@ -1339,67 +1163,89 @@ define <4 x float> @test_extend32_vec4(ptr %p) nounwind {
; P8-LABEL: test_extend32_vec4:
; P8: # %bb.0:
; P8-NEXT: mflr r0
-; P8-NEXT: stdu r1, -112(r1)
-; P8-NEXT: li r4, 48
-; P8-NEXT: std r0, 128(r1)
-; P8-NEXT: std r30, 96(r1) # 8-byte Folded Spill
-; P8-NEXT: mr r30, r3
-; P8-NEXT: lhz r3, 6(r3)
-; P8-NEXT: stxvd2x vs61, r1, r4 # 16-byte Folded Spill
-; P8-NEXT: li r4, 64
-; P8-NEXT: stxvd2x vs62, r1, r4 # 16-byte Folded Spill
+; P8-NEXT: stdu r1, -144(r1)
; P8-NEXT: li r4, 80
+; P8-NEXT: std r0, 160(r1)
+; P8-NEXT: std r29, 120(r1) # 8-byte Folded Spill
+; P8-NEXT: std r30, 128(r1) # 8-byte Folded Spill
+; P8-NEXT: stxvd2x vs62, r1, r4 # 16-byte Folded Spill
+; P8-NEXT: li r4, 96
; P8-NEXT: stxvd2x vs63, r1, r4 # 16-byte Folded Spill
+; P8-NEXT: lwz r4, 4(r3)
+; P8-NEXT: stw r4, 64(r1)
+; P8-NEXT: lwz r3, 0(r3)
+; P8-NEXT: stw r3, 48(r1)
+; P8-NEXT: addi r3, r1, 64
+; P8-NEXT: lxvd2x vs62, 0, r3
+; P8-NEXT: addi r3, r1, 48
+; P8-NEXT: lxvd2x vs0, 0, r3
+; P8-NEXT: mffprd r30, f0
+; P8-NEXT: clrldi r3, r30, 48
+; P8-NEXT: clrlwi r3, r3, 16
; P8-NEXT: bl __extendhfsf2
; P8-NEXT: nop
-; P8-NEXT: lhz r3, 2(r30)
+; P8-NEXT: mfvsrd r29, vs62
; P8-NEXT: xxlor vs63, f1, f1
+; P8-NEXT: clrldi r3, r29, 48
+; P8-NEXT: clrlwi r3, r3, 16
; P8-NEXT: bl __extendhfsf2
; P8-NEXT: nop
-; P8-NEXT: lhz r3, 4(r30)
-; P8-NEXT: xxlor vs62, f1, f1
+; P8-NEXT: rldicl r3, r30, 48, 48
+; P8-NEXT: xxmrghd vs0, vs1, vs63
+; P8-NEXT: clrlwi r3, r3, 16
+; P8-NEXT: xvcvdpsp vs62, vs0
; P8-NEXT: bl __extendhfsf2
; P8-NEXT: nop
-; P8-NEXT: lhz r3, 0(r30)
-; P8-NEXT: xxlor vs61, f1, f1
+; P8-NEXT: rldicl r3, r29, 48, 48
+; P8-NEXT: xxlor vs63, f1, f1
+; P8-NEXT: clrlwi r3, r3, 16
; P8-NEXT: bl __extendhfsf2
; P8-NEXT: nop
-; P8-NEXT: li r3, 80
-; P8-NEXT: xxmrghd vs0, vs61, vs1
-; P8-NEXT: xxmrghd vs1, vs63, vs62
-; P8-NEXT: ld r30, 96(r1) # 8-byte Folded Reload
-; P8-NEXT: lxvd2x vs63, r1, r3 # 16-byte Folded Reload
-; P8-NEXT: li r3, 64
+; P8-NEXT: xxmrghd vs0, vs1, vs63
+; P8-NEXT: li r3, 96
+; P8-NEXT: ld r30, 128(r1) # 8-byte Folded Reload
+; P8-NEXT: ld r29, 120(r1) # 8-byte Folded Reload
; P8-NEXT: xvcvdpsp vs34, vs0
-; P8-NEXT: xvcvdpsp vs35, vs1
+; P8-NEXT: lxvd2x vs63, r1, r3 # 16-byte Folded Reload
+; P8-NEXT: li r3, 80
+; P8-NEXT: vmrgew v2, v2, v30
; P8-NEXT: lxvd2x vs62, r1, r3 # 16-byte Folded Reload
-; P8-NEXT: li r3, 48
-; P8-NEXT: lxvd2x vs61, r1, r3 # 16-byte Folded Reload
-; P8-NEXT: vmrgew v2, v3, v2
-; P8-NEXT: addi r1, r1, 112
+; P8-NEXT: addi r1, r1, 144
; P8-NEXT: ld r0, 16(r1)
; P8-NEXT: mtlr r0
; P8-NEXT: blr
;
; P9-LABEL: test_extend32_vec4:
; P9: # %bb.0:
-; P9-NEXT: lhz r4, 6(r3)
+; P9-NEXT: lwz r4, 4(r3)
+; P9-NEXT: stw r4, -16(r1)
+; P9-NEXT: lwz r3, 0(r3)
+; P9-NEXT: lxv vs34, -16(r1)
+; P9-NEXT: stw r3, -32(r1)
+; P9-NEXT: li r3, 0
+; P9-NEXT: lxv vs35, -32(r1)
+; P9-NEXT: vex...
[truncated]
|
🐧 Linux x64 Test Results
✅ The build succeeded and all tests passed. |
🪟 Windows x64 Test Results
✅ The build succeeded and all tests passed. |
nikic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…at` (llvm#152632) On PowerPC targets, `half` uses the default legalization of promoting to a `f32`. However, this has some fundamental issues related to inability to round trip. Resolve this by switching to the soft legalization, which passes `f16` as an `i16`. The PowerPC ABI Specification does not define a `_Float16` type, so the calling convention changes are acceptable. Fixes the PowerPC part of llvm#97975 Fixes the PowerPC part of llvm#97981
On PowerPC targets,
halfuses the default legalization of promoting toa
f32. However, this has some fundamental issues related to inabilityto round trip. Resolve this by switching to the soft legalization, which
passes
f16as ani16.The PowerPC ABI Specification does not define a
_Float16type, so thecalling convention changes are acceptable.
Fixes the PowerPC part of #97975
Fixes the PowerPC part of #97981