Skip to content

Conversation

@tgross35
Copy link
Contributor

@tgross35 tgross35 commented Aug 8, 2025

On PowerPC targets, half uses the default legalization of promoting to
a f32. However, this has some fundamental issues related to inability
to round trip. Resolve this by switching to the soft legalization, which
passes f16 as an i16.

The PowerPC ABI Specification does not define a _Float16 type, so the
calling convention changes are acceptable.

Fixes the PowerPC part of #97975
Fixes the PowerPC part of #97981

@tgross35
Copy link
Contributor Author

tgross35 commented Aug 8, 2025

The second commit is the interesting part here. The first commit is in a standalone PR at #152625 and should land first.

I am referencing https://ftp.rtems.org/pub/rtems/people/sebh/ABI64BitOpenPOWERv1.1_16July2015_pub.pdf for the ABI not having _Float16, not sure if there is a newer version available.

There is a new crash expanding LRINT that I am trying to figure out.

@Gelbpunkt
Copy link
Contributor

Gelbpunkt commented Aug 8, 2025

I am referencing https://ftp.rtems.org/pub/rtems/people/sebh/ABI64BitOpenPOWERv1.1_16July2015_pub.pdf for the ABI not having _Float16, not sure if there is a newer version available.

The latest specification of the ELFv2 ABI can be found here: https://files.openpower.foundation/s/cfA2oFPXbbZwEBK :)

"Table 2.13. Decimal Floating-Point Types" does not contain _Decimal16, so I think you are correct.

@tgross35
Copy link
Contributor Author

tgross35 commented Aug 8, 2025

Oh that's great, thank you for the link!

"Table 2.13. Decimal Floating-Point Types" does not contain _Decimal16, so I think you are correct.

I think the correct place would be "Table 2.11. Scalar Types" under "Binary Floating-Point" (_Float16 is base-2, _Decimal16 is base-10), but the answer is the same.

@alexrp alexrp mentioned this pull request Aug 2, 2025
27 tasks
@tgross35
Copy link
Contributor Author

tgross35 commented Aug 8, 2025

I'm stuck on the new crash, thought it just needed a new setOperationAction but that doesn't seem to do anything. Asked at https://discord.com/channels/636084430946959380/636732535434510338/1403227999050006621

@tgross35 tgross35 force-pushed the ppc-soft-promote-half branch 2 times, most recently from 35a405b to 77508b7 Compare August 8, 2025 12:45
@tgross35
Copy link
Contributor Author

tgross35 commented Aug 8, 2025

@chenzheng1030, @EsmeYi, @lei137, @RolandF77 could you review this?

Only the last commit "[PowerPC] Change half to use soft promotion" is relevant to be reviewed for this PR. There are other commits here because it needs a few other things to land first:

I'm avoiding marking this as "ready to review" for now so it doesn't ping everybody from the backends touched in these other PRs, but the change to soft promotion here should be ready to review.

@tgross35 tgross35 force-pushed the ppc-soft-promote-half branch from 77508b7 to a8d24ba Compare August 8, 2025 13:16
@tgross35
Copy link
Contributor Author

tgross35 commented Aug 8, 2025

(current failure in "ERROR: test_modulelist_deadlock" has to be unrelated)

On PowerPC targets, `half` uses the default legalization of promoting to
a `f32`. However, this has some fundamental issues related to inability
to round trip. Resolve this by switching to the soft legalization, which
passes `f16` as an `i16`.

The PowerPC ABI Specification does not define a `_Float16` type, so the
calling convention changes are acceptable.

Fixes the PowerPC portion of [1]. A similar change was done for MIPS in
f0231b6 ("[MIPS] Use softPromoteHalf legalization for fp16 rather
than PromoteFloat (llvm#110199)") and for Loongarch in 13280d9
("[loongarch][DAG][FREEZE] Fix crash when FREEZE a half(f16) type on
loongarch (llvm#107791)").

[1]: llvm#97975
@tgross35 tgross35 force-pushed the ppc-soft-promote-half branch from a636c23 to f0508b4 Compare January 8, 2026 13:01
@tgross35 tgross35 marked this pull request as ready for review January 8, 2026 13:10
@tgross35
Copy link
Contributor Author

tgross35 commented Jan 8, 2026

Think this should be ready to go now that #152684 has landed. Cc @nikic also for review.

@llvmbot
Copy link
Member

llvmbot commented Jan 8, 2026

@llvm/pr-subscribers-backend-powerpc

Author: Trevor Gross (tgross35)

Changes

On PowerPC targets, half uses the default legalization of promoting to
a f32. However, this has some fundamental issues related to inability
to round trip. Resolve this by switching to the soft legalization, which
passes f16 as an i16.

The PowerPC ABI Specification does not define a _Float16 type, so the
calling convention changes are acceptable.

Fixes the PowerPC part of #97975
Fixes the PowerPC part of #97981


Patch is 291.93 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/152632.diff

13 Files Affected:

  • (modified) llvm/docs/ReleaseNotes.md (+2)
  • (modified) llvm/lib/Target/PowerPC/PPCISelLowering.h (+2)
  • (modified) llvm/test/CodeGen/Generic/half.ll (+3-3)
  • (modified) llvm/test/CodeGen/PowerPC/atomics.ll (+5-57)
  • (modified) llvm/test/CodeGen/PowerPC/f128-conv.ll (+6-7)
  • (modified) llvm/test/CodeGen/PowerPC/half.ll (+262-577)
  • (modified) llvm/test/CodeGen/PowerPC/ldexp.ll (+3-3)
  • (modified) llvm/test/CodeGen/PowerPC/llvm.frexp.ll (+67-76)
  • (modified) llvm/test/CodeGen/PowerPC/llvm.modf.ll (+49-34)
  • (modified) llvm/test/CodeGen/PowerPC/pr48519.ll (+30-75)
  • (modified) llvm/test/CodeGen/PowerPC/pr49092.ll (-12)
  • (modified) llvm/test/CodeGen/PowerPC/vector-llrint.ll (+873-1682)
  • (modified) llvm/test/CodeGen/PowerPC/vector-lrint.ll (+869-1678)
diff --git a/llvm/docs/ReleaseNotes.md b/llvm/docs/ReleaseNotes.md
index 910a50214df2f..169d4026820d8 100644
--- a/llvm/docs/ReleaseNotes.md
+++ b/llvm/docs/ReleaseNotes.md
@@ -142,6 +142,8 @@ Changes to the MIPS Backend
 Changes to the PowerPC Backend
 ------------------------------
 
+* `half` now uses a soft float ABI, which works correctly in more cases.
+
 Changes to the RISC-V Backend
 -----------------------------
 
diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.h b/llvm/lib/Target/PowerPC/PPCISelLowering.h
index daae839479c3c..97546f37e6d5e 100644
--- a/llvm/lib/Target/PowerPC/PPCISelLowering.h
+++ b/llvm/lib/Target/PowerPC/PPCISelLowering.h
@@ -212,6 +212,8 @@ namespace llvm {
 
     bool useSoftFloat() const override;
 
+    bool softPromoteHalfType() const override { return true; }
+
     bool hasSPE() const;
 
     MVT getScalarShiftAmountTy(const DataLayout &, EVT) const override {
diff --git a/llvm/test/CodeGen/Generic/half.ll b/llvm/test/CodeGen/Generic/half.ll
index ef7bfe2f2d9ce..f19d6da4eeb7b 100644
--- a/llvm/test/CodeGen/Generic/half.ll
+++ b/llvm/test/CodeGen/Generic/half.ll
@@ -29,9 +29,9 @@
 ; RUN: %if mips-registered-target        %{ llc %s -o - -mtriple=mipsel-unknown-linux-gnu        | FileCheck %s --check-prefixes=ALL,CHECK %}
 ; RUN: %if msp430-registered-target      %{ llc %s -o - -mtriple=msp430-none-elf                 | FileCheck %s --check-prefixes=ALL,CHECK %}
 ; RUN: %if nvptx-registered-target       %{ llc %s -o - -mtriple=nvptx64-nvidia-cuda             | FileCheck %s --check-prefixes=NOCRASH   %}
-; RUN: %if powerpc-registered-target     %{ llc %s -o - -mtriple=powerpc-unknown-linux-gnu       | FileCheck %s --check-prefixes=ALL,BAD   %}
-; RUN: %if powerpc-registered-target     %{ llc %s -o - -mtriple=powerpc64-unknown-linux-gnu     | FileCheck %s --check-prefixes=ALL,BAD   %}
-; RUN: %if powerpc-registered-target     %{ llc %s -o - -mtriple=powerpc64le-unknown-linux-gnu   | FileCheck %s --check-prefixes=ALL,BAD   %}
+; RUN: %if powerpc-registered-target     %{ llc %s -o - -mtriple=powerpc-unknown-linux-gnu       | FileCheck %s --check-prefixes=ALL,CHECK %}
+; RUN: %if powerpc-registered-target     %{ llc %s -o - -mtriple=powerpc64-unknown-linux-gnu     | FileCheck %s --check-prefixes=ALL,CHECK %}
+; RUN: %if powerpc-registered-target     %{ llc %s -o - -mtriple=powerpc64le-unknown-linux-gnu   | FileCheck %s --check-prefixes=ALL,CHECK %}
 ; RUN: %if riscv-registered-target       %{ llc %s -o - -mtriple=riscv32-unknown-linux-gnu       | FileCheck %s --check-prefixes=ALL,CHECK %}
 ; RUN: %if riscv-registered-target       %{ llc %s -o - -mtriple=riscv64-unknown-linux-gnu       | FileCheck %s --check-prefixes=ALL,CHECK %}
 ; RUN: %if sparc-registered-target       %{ llc %s -o - -mtriple=sparc-unknown-linux-gnu         | FileCheck %s --check-prefixes=ALL,CHECK %}
diff --git a/llvm/test/CodeGen/PowerPC/atomics.ll b/llvm/test/CodeGen/PowerPC/atomics.ll
index ff1a7222cc92c..54a35dab2a422 100644
--- a/llvm/test/CodeGen/PowerPC/atomics.ll
+++ b/llvm/test/CodeGen/PowerPC/atomics.ll
@@ -469,39 +469,20 @@ define i64 @and_i64_release(ptr %mem, i64 %operand) {
 define half @load_atomic_f16__seq_cst(ptr %ptr) {
 ; PPC32-LABEL: load_atomic_f16__seq_cst:
 ; PPC32:       # %bb.0:
-; PPC32-NEXT:    mflr r0
-; PPC32-NEXT:    stwu r1, -16(r1)
-; PPC32-NEXT:    stw r0, 20(r1)
-; PPC32-NEXT:    .cfi_def_cfa_offset 16
-; PPC32-NEXT:    .cfi_offset lr, 4
 ; PPC32-NEXT:    sync
 ; PPC32-NEXT:    lhz r3, 0(r3)
 ; PPC32-NEXT:    cmpw cr7, r3, r3
 ; PPC32-NEXT:    bne- cr7, .+4
 ; PPC32-NEXT:    isync
-; PPC32-NEXT:    bl __extendhfsf2
-; PPC32-NEXT:    lwz r0, 20(r1)
-; PPC32-NEXT:    addi r1, r1, 16
-; PPC32-NEXT:    mtlr r0
 ; PPC32-NEXT:    blr
 ;
 ; PPC64-LABEL: load_atomic_f16__seq_cst:
 ; PPC64:       # %bb.0:
-; PPC64-NEXT:    mflr r0
-; PPC64-NEXT:    stdu r1, -112(r1)
-; PPC64-NEXT:    std r0, 128(r1)
-; PPC64-NEXT:    .cfi_def_cfa_offset 112
-; PPC64-NEXT:    .cfi_offset lr, 16
 ; PPC64-NEXT:    sync
 ; PPC64-NEXT:    lhz r3, 0(r3)
 ; PPC64-NEXT:    cmpd cr7, r3, r3
 ; PPC64-NEXT:    bne- cr7, .+4
 ; PPC64-NEXT:    isync
-; PPC64-NEXT:    bl __extendhfsf2
-; PPC64-NEXT:    nop
-; PPC64-NEXT:    addi r1, r1, 112
-; PPC64-NEXT:    ld r0, 16(r1)
-; PPC64-NEXT:    mtlr r0
 ; PPC64-NEXT:    blr
   %val = load atomic half, ptr %ptr seq_cst, align 2
   ret half %val
@@ -575,44 +556,11 @@ define double @load_atomic_f64__seq_cst(ptr %ptr) {
 }
 
 define void @store_atomic_f16__seq_cst(ptr %ptr, half %val1) {
-; PPC32-LABEL: store_atomic_f16__seq_cst:
-; PPC32:       # %bb.0:
-; PPC32-NEXT:    mflr r0
-; PPC32-NEXT:    stwu r1, -16(r1)
-; PPC32-NEXT:    stw r0, 20(r1)
-; PPC32-NEXT:    .cfi_def_cfa_offset 16
-; PPC32-NEXT:    .cfi_offset lr, 4
-; PPC32-NEXT:    .cfi_offset r30, -8
-; PPC32-NEXT:    stw r30, 8(r1) # 4-byte Folded Spill
-; PPC32-NEXT:    mr r30, r3
-; PPC32-NEXT:    bl __truncsfhf2
-; PPC32-NEXT:    sync
-; PPC32-NEXT:    sth r3, 0(r30)
-; PPC32-NEXT:    lwz r30, 8(r1) # 4-byte Folded Reload
-; PPC32-NEXT:    lwz r0, 20(r1)
-; PPC32-NEXT:    addi r1, r1, 16
-; PPC32-NEXT:    mtlr r0
-; PPC32-NEXT:    blr
-;
-; PPC64-LABEL: store_atomic_f16__seq_cst:
-; PPC64:       # %bb.0:
-; PPC64-NEXT:    mflr r0
-; PPC64-NEXT:    stdu r1, -128(r1)
-; PPC64-NEXT:    std r0, 144(r1)
-; PPC64-NEXT:    .cfi_def_cfa_offset 128
-; PPC64-NEXT:    .cfi_offset lr, 16
-; PPC64-NEXT:    .cfi_offset r30, -16
-; PPC64-NEXT:    std r30, 112(r1) # 8-byte Folded Spill
-; PPC64-NEXT:    mr r30, r3
-; PPC64-NEXT:    bl __truncsfhf2
-; PPC64-NEXT:    nop
-; PPC64-NEXT:    sync
-; PPC64-NEXT:    sth r3, 0(r30)
-; PPC64-NEXT:    ld r30, 112(r1) # 8-byte Folded Reload
-; PPC64-NEXT:    addi r1, r1, 128
-; PPC64-NEXT:    ld r0, 16(r1)
-; PPC64-NEXT:    mtlr r0
-; PPC64-NEXT:    blr
+; CHECK-LABEL: store_atomic_f16__seq_cst:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    sync
+; CHECK-NEXT:    sth r4, 0(r3)
+; CHECK-NEXT:    blr
   store atomic half %val1, ptr %ptr seq_cst, align 2
   ret void
 }
diff --git a/llvm/test/CodeGen/PowerPC/f128-conv.ll b/llvm/test/CodeGen/PowerPC/f128-conv.ll
index f8b2861156db4..080843217e8c9 100644
--- a/llvm/test/CodeGen/PowerPC/f128-conv.ll
+++ b/llvm/test/CodeGen/PowerPC/f128-conv.ll
@@ -1349,9 +1349,6 @@ define half @trunc(fp128 %a) nounwind {
 ; CHECK-NEXT:    std r0, 48(r1)
 ; CHECK-NEXT:    bl __trunckfhf2
 ; CHECK-NEXT:    nop
-; CHECK-NEXT:    clrlwi r3, r3, 16
-; CHECK-NEXT:    mtfprwz f0, r3
-; CHECK-NEXT:    xscvhpdp f1, f0
 ; CHECK-NEXT:    addi r1, r1, 32
 ; CHECK-NEXT:    ld r0, 16(r1)
 ; CHECK-NEXT:    mtlr r0
@@ -1364,9 +1361,6 @@ define half @trunc(fp128 %a) nounwind {
 ; CHECK-P8-NEXT:    std r0, 48(r1)
 ; CHECK-P8-NEXT:    bl __trunckfhf2
 ; CHECK-P8-NEXT:    nop
-; CHECK-P8-NEXT:    clrldi r3, r3, 48
-; CHECK-P8-NEXT:    bl __extendhfsf2
-; CHECK-P8-NEXT:    nop
 ; CHECK-P8-NEXT:    addi r1, r1, 32
 ; CHECK-P8-NEXT:    ld r0, 16(r1)
 ; CHECK-P8-NEXT:    mtlr r0
@@ -1379,7 +1373,9 @@ entry:
 define fp128 @ext(half %a) nounwind {
 ; CHECK-LABEL: ext:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    xscpsgndp v2, f1, f1
+; CHECK-NEXT:    clrlwi r3, r3, 16
+; CHECK-NEXT:    mtfprwz f0, r3
+; CHECK-NEXT:    xscvhpdp v2, f0
 ; CHECK-NEXT:    xscvdpqp v2, v2
 ; CHECK-NEXT:    blr
 ;
@@ -1387,7 +1383,10 @@ define fp128 @ext(half %a) nounwind {
 ; CHECK-P8:       # %bb.0: # %entry
 ; CHECK-P8-NEXT:    mflr r0
 ; CHECK-P8-NEXT:    stdu r1, -32(r1)
+; CHECK-P8-NEXT:    clrldi r3, r3, 48
 ; CHECK-P8-NEXT:    std r0, 48(r1)
+; CHECK-P8-NEXT:    bl __extendhfsf2
+; CHECK-P8-NEXT:    nop
 ; CHECK-P8-NEXT:    bl __extendsfkf2
 ; CHECK-P8-NEXT:    nop
 ; CHECK-P8-NEXT:    addi r1, r1, 32
diff --git a/llvm/test/CodeGen/PowerPC/half.ll b/llvm/test/CodeGen/PowerPC/half.ll
index 903ea691ae6ba..6eaac1d7fc5c1 100644
--- a/llvm/test/CodeGen/PowerPC/half.ll
+++ b/llvm/test/CodeGen/PowerPC/half.ll
@@ -21,40 +21,13 @@
 define void @store(half %x, ptr %p) nounwind {
 ; PPC32-LABEL: store:
 ; PPC32:       # %bb.0:
-; PPC32-NEXT:    mflr r0
-; PPC32-NEXT:    stwu r1, -16(r1)
-; PPC32-NEXT:    stw r0, 20(r1)
-; PPC32-NEXT:    stw r30, 8(r1) # 4-byte Folded Spill
-; PPC32-NEXT:    mr r30, r3
-; PPC32-NEXT:    bl __truncsfhf2
-; PPC32-NEXT:    sth r3, 0(r30)
-; PPC32-NEXT:    lwz r30, 8(r1) # 4-byte Folded Reload
-; PPC32-NEXT:    lwz r0, 20(r1)
-; PPC32-NEXT:    addi r1, r1, 16
-; PPC32-NEXT:    mtlr r0
+; PPC32-NEXT:    sth r3, 0(r4)
 ; PPC32-NEXT:    blr
 ;
-; P8-LABEL: store:
-; P8:       # %bb.0:
-; P8-NEXT:    mflr r0
-; P8-NEXT:    std r30, -16(r1) # 8-byte Folded Spill
-; P8-NEXT:    stdu r1, -48(r1)
-; P8-NEXT:    std r0, 64(r1)
-; P8-NEXT:    mr r30, r4
-; P8-NEXT:    bl __truncsfhf2
-; P8-NEXT:    nop
-; P8-NEXT:    sth r3, 0(r30)
-; P8-NEXT:    addi r1, r1, 48
-; P8-NEXT:    ld r0, 16(r1)
-; P8-NEXT:    ld r30, -16(r1) # 8-byte Folded Reload
-; P8-NEXT:    mtlr r0
-; P8-NEXT:    blr
-;
-; P9-LABEL: store:
-; P9:       # %bb.0:
-; P9-NEXT:    xscvdphp f0, f1
-; P9-NEXT:    stxsihx f0, 0, r4
-; P9-NEXT:    blr
+; CHECK-LABEL: store:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    sth r3, 0(r4)
+; CHECK-NEXT:    blr
 ;
 ; SOFT-LABEL: store:
 ; SOFT:       # %bb.0:
@@ -63,18 +36,7 @@ define void @store(half %x, ptr %p) nounwind {
 ;
 ; BE-LABEL: store:
 ; BE:       # %bb.0:
-; BE-NEXT:    mflr r0
-; BE-NEXT:    stdu r1, -128(r1)
-; BE-NEXT:    std r0, 144(r1)
-; BE-NEXT:    std r30, 112(r1) # 8-byte Folded Spill
-; BE-NEXT:    mr r30, r4
-; BE-NEXT:    bl __truncsfhf2
-; BE-NEXT:    nop
-; BE-NEXT:    sth r3, 0(r30)
-; BE-NEXT:    ld r30, 112(r1) # 8-byte Folded Reload
-; BE-NEXT:    addi r1, r1, 128
-; BE-NEXT:    ld r0, 16(r1)
-; BE-NEXT:    mtlr r0
+; BE-NEXT:    sth r3, 0(r4)
 ; BE-NEXT:    blr
   store half %x, ptr %p
   ret void
@@ -83,34 +45,13 @@ define void @store(half %x, ptr %p) nounwind {
 define half @return(ptr %p) nounwind {
 ; PPC32-LABEL: return:
 ; PPC32:       # %bb.0:
-; PPC32-NEXT:    mflr r0
-; PPC32-NEXT:    stwu r1, -16(r1)
-; PPC32-NEXT:    stw r0, 20(r1)
 ; PPC32-NEXT:    lhz r3, 0(r3)
-; PPC32-NEXT:    bl __extendhfsf2
-; PPC32-NEXT:    lwz r0, 20(r1)
-; PPC32-NEXT:    addi r1, r1, 16
-; PPC32-NEXT:    mtlr r0
 ; PPC32-NEXT:    blr
 ;
-; P8-LABEL: return:
-; P8:       # %bb.0:
-; P8-NEXT:    mflr r0
-; P8-NEXT:    stdu r1, -32(r1)
-; P8-NEXT:    std r0, 48(r1)
-; P8-NEXT:    lhz r3, 0(r3)
-; P8-NEXT:    bl __extendhfsf2
-; P8-NEXT:    nop
-; P8-NEXT:    addi r1, r1, 32
-; P8-NEXT:    ld r0, 16(r1)
-; P8-NEXT:    mtlr r0
-; P8-NEXT:    blr
-;
-; P9-LABEL: return:
-; P9:       # %bb.0:
-; P9-NEXT:    lxsihzx f0, 0, r3
-; P9-NEXT:    xscvhpdp f1, f0
-; P9-NEXT:    blr
+; CHECK-LABEL: return:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    lhz r3, 0(r3)
+; CHECK-NEXT:    blr
 ;
 ; SOFT-LABEL: return:
 ; SOFT:       # %bb.0:
@@ -119,15 +60,7 @@ define half @return(ptr %p) nounwind {
 ;
 ; BE-LABEL: return:
 ; BE:       # %bb.0:
-; BE-NEXT:    mflr r0
-; BE-NEXT:    stdu r1, -112(r1)
-; BE-NEXT:    std r0, 128(r1)
 ; BE-NEXT:    lhz r3, 0(r3)
-; BE-NEXT:    bl __extendhfsf2
-; BE-NEXT:    nop
-; BE-NEXT:    addi r1, r1, 112
-; BE-NEXT:    ld r0, 16(r1)
-; BE-NEXT:    mtlr r0
 ; BE-NEXT:    blr
   %r = load half, ptr %p
   ret half %r
@@ -317,11 +250,6 @@ define dso_local void @stored(ptr nocapture %a, double %b) local_unnamed_addr no
 ; SOFT-NEXT:    std r0, 64(r1)
 ; SOFT-NEXT:    bl __truncdfhf2
 ; SOFT-NEXT:    nop
-; SOFT-NEXT:    clrldi r3, r3, 48
-; SOFT-NEXT:    bl __extendhfsf2
-; SOFT-NEXT:    nop
-; SOFT-NEXT:    bl __truncsfhf2
-; SOFT-NEXT:    nop
 ; SOFT-NEXT:    sth r3, 0(r30)
 ; SOFT-NEXT:    addi r1, r1, 48
 ; SOFT-NEXT:    ld r0, 16(r1)
@@ -400,11 +328,6 @@ define dso_local void @storef(ptr nocapture %a, float %b) local_unnamed_addr nou
 ; SOFT-NEXT:    std r0, 64(r1)
 ; SOFT-NEXT:    bl __truncsfhf2
 ; SOFT-NEXT:    nop
-; SOFT-NEXT:    clrldi r3, r3, 48
-; SOFT-NEXT:    bl __extendhfsf2
-; SOFT-NEXT:    nop
-; SOFT-NEXT:    bl __truncsfhf2
-; SOFT-NEXT:    nop
 ; SOFT-NEXT:    sth r3, 0(r30)
 ; SOFT-NEXT:    addi r1, r1, 48
 ; SOFT-NEXT:    ld r0, 16(r1)
@@ -449,21 +372,8 @@ define void @test_load_store(ptr %in, ptr %out) nounwind {
 ;
 ; SOFT-LABEL: test_load_store:
 ; SOFT:       # %bb.0:
-; SOFT-NEXT:    mflr r0
-; SOFT-NEXT:    std r30, -16(r1) # 8-byte Folded Spill
-; SOFT-NEXT:    stdu r1, -48(r1)
-; SOFT-NEXT:    std r0, 64(r1)
-; SOFT-NEXT:    mr r30, r4
 ; SOFT-NEXT:    lhz r3, 0(r3)
-; SOFT-NEXT:    bl __extendhfsf2
-; SOFT-NEXT:    nop
-; SOFT-NEXT:    bl __truncsfhf2
-; SOFT-NEXT:    nop
-; SOFT-NEXT:    sth r3, 0(r30)
-; SOFT-NEXT:    addi r1, r1, 48
-; SOFT-NEXT:    ld r0, 16(r1)
-; SOFT-NEXT:    ld r30, -16(r1) # 8-byte Folded Reload
-; SOFT-NEXT:    mtlr r0
+; SOFT-NEXT:    sth r3, 0(r4)
 ; SOFT-NEXT:    blr
 ;
 ; BE-LABEL: test_load_store:
@@ -529,35 +439,11 @@ define void @test_bitcast_to_half(ptr %addr, i16 %in) nounwind {
 define half @from_bits(i16 %x) nounwind {
 ; PPC32-LABEL: from_bits:
 ; PPC32:       # %bb.0:
-; PPC32-NEXT:    mflr r0
-; PPC32-NEXT:    stwu r1, -16(r1)
-; PPC32-NEXT:    clrlwi r3, r3, 16
-; PPC32-NEXT:    stw r0, 20(r1)
-; PPC32-NEXT:    bl __extendhfsf2
-; PPC32-NEXT:    lwz r0, 20(r1)
-; PPC32-NEXT:    addi r1, r1, 16
-; PPC32-NEXT:    mtlr r0
 ; PPC32-NEXT:    blr
 ;
-; P8-LABEL: from_bits:
-; P8:       # %bb.0:
-; P8-NEXT:    mflr r0
-; P8-NEXT:    stdu r1, -32(r1)
-; P8-NEXT:    clrldi r3, r3, 48
-; P8-NEXT:    std r0, 48(r1)
-; P8-NEXT:    bl __extendhfsf2
-; P8-NEXT:    nop
-; P8-NEXT:    addi r1, r1, 32
-; P8-NEXT:    ld r0, 16(r1)
-; P8-NEXT:    mtlr r0
-; P8-NEXT:    blr
-;
-; P9-LABEL: from_bits:
-; P9:       # %bb.0:
-; P9-NEXT:    clrlwi r3, r3, 16
-; P9-NEXT:    mtfprwz f0, r3
-; P9-NEXT:    xscvhpdp f1, f0
-; P9-NEXT:    blr
+; CHECK-LABEL: from_bits:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    blr
 ;
 ; SOFT-LABEL: from_bits:
 ; SOFT:       # %bb.0:
@@ -565,15 +451,6 @@ define half @from_bits(i16 %x) nounwind {
 ;
 ; BE-LABEL: from_bits:
 ; BE:       # %bb.0:
-; BE-NEXT:    mflr r0
-; BE-NEXT:    stdu r1, -112(r1)
-; BE-NEXT:    clrldi r3, r3, 48
-; BE-NEXT:    std r0, 128(r1)
-; BE-NEXT:    bl __extendhfsf2
-; BE-NEXT:    nop
-; BE-NEXT:    addi r1, r1, 112
-; BE-NEXT:    ld r0, 16(r1)
-; BE-NEXT:    mtlr r0
 ; BE-NEXT:    blr
   %res = bitcast i16 %x to half
   ret half %res
@@ -582,35 +459,11 @@ define half @from_bits(i16 %x) nounwind {
 define i16 @to_bits(half %x) nounwind {
 ; PPC32-LABEL: to_bits:
 ; PPC32:       # %bb.0:
-; PPC32-NEXT:    mflr r0
-; PPC32-NEXT:    stwu r1, -16(r1)
-; PPC32-NEXT:    stw r0, 20(r1)
-; PPC32-NEXT:    bl __truncsfhf2
-; PPC32-NEXT:    clrlwi r3, r3, 16
-; PPC32-NEXT:    lwz r0, 20(r1)
-; PPC32-NEXT:    addi r1, r1, 16
-; PPC32-NEXT:    mtlr r0
 ; PPC32-NEXT:    blr
 ;
-; P8-LABEL: to_bits:
-; P8:       # %bb.0:
-; P8-NEXT:    mflr r0
-; P8-NEXT:    stdu r1, -32(r1)
-; P8-NEXT:    std r0, 48(r1)
-; P8-NEXT:    bl __truncsfhf2
-; P8-NEXT:    nop
-; P8-NEXT:    clrldi r3, r3, 48
-; P8-NEXT:    addi r1, r1, 32
-; P8-NEXT:    ld r0, 16(r1)
-; P8-NEXT:    mtlr r0
-; P8-NEXT:    blr
-;
-; P9-LABEL: to_bits:
-; P9:       # %bb.0:
-; P9-NEXT:    xscvdphp f0, f1
-; P9-NEXT:    mffprwz r3, f0
-; P9-NEXT:    clrlwi r3, r3, 16
-; P9-NEXT:    blr
+; CHECK-LABEL: to_bits:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    blr
 ;
 ; SOFT-LABEL: to_bits:
 ; SOFT:       # %bb.0:
@@ -618,15 +471,6 @@ define i16 @to_bits(half %x) nounwind {
 ;
 ; BE-LABEL: to_bits:
 ; BE:       # %bb.0:
-; BE-NEXT:    mflr r0
-; BE-NEXT:    stdu r1, -112(r1)
-; BE-NEXT:    std r0, 128(r1)
-; BE-NEXT:    bl __truncsfhf2
-; BE-NEXT:    nop
-; BE-NEXT:    clrldi r3, r3, 48
-; BE-NEXT:    addi r1, r1, 112
-; BE-NEXT:    ld r0, 16(r1)
-; BE-NEXT:    mtlr r0
 ; BE-NEXT:    blr
     %res = bitcast half %x to i16
     ret i16 %res
@@ -804,11 +648,6 @@ define void @test_trunc32(float %in, ptr %addr) nounwind {
 ; SOFT-NEXT:    mr r30, r4
 ; SOFT-NEXT:    bl __truncsfhf2
 ; SOFT-NEXT:    nop
-; SOFT-NEXT:    clrldi r3, r3, 48
-; SOFT-NEXT:    bl __extendhfsf2
-; SOFT-NEXT:    nop
-; SOFT-NEXT:    bl __truncsfhf2
-; SOFT-NEXT:    nop
 ; SOFT-NEXT:    sth r3, 0(r30)
 ; SOFT-NEXT:    addi r1, r1, 48
 ; SOFT-NEXT:    ld r0, 16(r1)
@@ -882,11 +721,6 @@ define void @test_trunc64(double %in, ptr %addr) nounwind {
 ; SOFT-NEXT:    mr r30, r4
 ; SOFT-NEXT:    bl __truncdfhf2
 ; SOFT-NEXT:    nop
-; SOFT-NEXT:    clrldi r3, r3, 48
-; SOFT-NEXT:    bl __extendhfsf2
-; SOFT-NEXT:    nop
-; SOFT-NEXT:    bl __truncsfhf2
-; SOFT-NEXT:    nop
 ; SOFT-NEXT:    sth r3, 0(r30)
 ; SOFT-NEXT:    addi r1, r1, 48
 ; SOFT-NEXT:    ld r0, 16(r1)
@@ -1041,11 +875,6 @@ define void @test_sitofp_i64(i64 %a, ptr %p) nounwind {
 ; SOFT-NEXT:    clrldi r3, r3, 32
 ; SOFT-NEXT:    bl __truncsfhf2
 ; SOFT-NEXT:    nop
-; SOFT-NEXT:    clrldi r3, r3, 48
-; SOFT-NEXT:    bl __extendhfsf2
-; SOFT-NEXT:    nop
-; SOFT-NEXT:    bl __truncsfhf2
-; SOFT-NEXT:    nop
 ; SOFT-NEXT:    sth r3, 0(r30)
 ; SOFT-NEXT:    addi r1, r1, 48
 ; SOFT-NEXT:    ld r0, 16(r1)
@@ -1228,11 +1057,6 @@ define void @test_uitofp_i64(i64 %a, ptr %p) nounwind {
 ; SOFT-NEXT:    nop
 ; SOFT-NEXT:    bl __truncsfhf2
 ; SOFT-NEXT:    nop
-; SOFT-NEXT:    clrldi r3, r3, 48
-; SOFT-NEXT:    bl __extendhfsf2
-; SOFT-NEXT:    nop
-; SOFT-NEXT:    bl __truncsfhf2
-; SOFT-NEXT:    nop
 ; SOFT-NEXT:    sth r3, 0(r30)
 ; SOFT-NEXT:    addi r1, r1, 48
 ; SOFT-NEXT:    ld r0, 16(r1)
@@ -1339,67 +1163,89 @@ define <4 x float> @test_extend32_vec4(ptr %p) nounwind {
 ; P8-LABEL: test_extend32_vec4:
 ; P8:       # %bb.0:
 ; P8-NEXT:    mflr r0
-; P8-NEXT:    stdu r1, -112(r1)
-; P8-NEXT:    li r4, 48
-; P8-NEXT:    std r0, 128(r1)
-; P8-NEXT:    std r30, 96(r1) # 8-byte Folded Spill
-; P8-NEXT:    mr r30, r3
-; P8-NEXT:    lhz r3, 6(r3)
-; P8-NEXT:    stxvd2x vs61, r1, r4 # 16-byte Folded Spill
-; P8-NEXT:    li r4, 64
-; P8-NEXT:    stxvd2x vs62, r1, r4 # 16-byte Folded Spill
+; P8-NEXT:    stdu r1, -144(r1)
 ; P8-NEXT:    li r4, 80
+; P8-NEXT:    std r0, 160(r1)
+; P8-NEXT:    std r29, 120(r1) # 8-byte Folded Spill
+; P8-NEXT:    std r30, 128(r1) # 8-byte Folded Spill
+; P8-NEXT:    stxvd2x vs62, r1, r4 # 16-byte Folded Spill
+; P8-NEXT:    li r4, 96
 ; P8-NEXT:    stxvd2x vs63, r1, r4 # 16-byte Folded Spill
+; P8-NEXT:    lwz r4, 4(r3)
+; P8-NEXT:    stw r4, 64(r1)
+; P8-NEXT:    lwz r3, 0(r3)
+; P8-NEXT:    stw r3, 48(r1)
+; P8-NEXT:    addi r3, r1, 64
+; P8-NEXT:    lxvd2x vs62, 0, r3
+; P8-NEXT:    addi r3, r1, 48
+; P8-NEXT:    lxvd2x vs0, 0, r3
+; P8-NEXT:    mffprd r30, f0
+; P8-NEXT:    clrldi r3, r30, 48
+; P8-NEXT:    clrlwi r3, r3, 16
 ; P8-NEXT:    bl __extendhfsf2
 ; P8-NEXT:    nop
-; P8-NEXT:    lhz r3, 2(r30)
+; P8-NEXT:    mfvsrd r29, vs62
 ; P8-NEXT:    xxlor vs63, f1, f1
+; P8-NEXT:    clrldi r3, r29, 48
+; P8-NEXT:    clrlwi r3, r3, 16
 ; P8-NEXT:    bl __extendhfsf2
 ; P8-NEXT:    nop
-; P8-NEXT:    lhz r3, 4(r30)
-; P8-NEXT:    xxlor vs62, f1, f1
+; P8-NEXT:    rldicl r3, r30, 48, 48
+; P8-NEXT:    xxmrghd vs0, vs1, vs63
+; P8-NEXT:    clrlwi r3, r3, 16
+; P8-NEXT:    xvcvdpsp vs62, vs0
 ; P8-NEXT:    bl __extendhfsf2
 ; P8-NEXT:    nop
-; P8-NEXT:    lhz r3, 0(r30)
-; P8-NEXT:    xxlor vs61, f1, f1
+; P8-NEXT:    rldicl r3, r29, 48, 48
+; P8-NEXT:    xxlor vs63, f1, f1
+; P8-NEXT:    clrlwi r3, r3, 16
 ; P8-NEXT:    bl __extendhfsf2
 ; P8-NEXT:    nop
-; P8-NEXT:    li r3, 80
-; P8-NEXT:    xxmrghd vs0, vs61, vs1
-; P8-NEXT:    xxmrghd vs1, vs63, vs62
-; P8-NEXT:    ld r30, 96(r1) # 8-byte Folded Reload
-; P8-NEXT:    lxvd2x vs63, r1, r3 # 16-byte Folded Reload
-; P8-NEXT:    li r3, 64
+; P8-NEXT:    xxmrghd vs0, vs1, vs63
+; P8-NEXT:    li r3, 96
+; P8-NEXT:    ld r30, 128(r1) # 8-byte Folded Reload
+; P8-NEXT:    ld r29, 120(r1) # 8-byte Folded Reload
 ; P8-NEXT:    xvcvdpsp vs34, vs0
-; P8-NEXT:    xvcvdpsp vs35, vs1
+; P8-NEXT:    lxvd2x vs63, r1, r3 # 16-byte Folded Reload
+; P8-NEXT:    li r3, 80
+; P8-NEXT:    vmrgew v2, v2, v30
 ; P8-NEXT:    lxvd2x vs62, r1, r3 # 16-byte Folded Reload
-; P8-NEXT:    li r3, 48
-; P8-NEXT:    lxvd2x vs61, r1, r3 # 16-byte Folded Reload
-; P8-NEXT:    vmrgew v2, v3, v2
-; P8-NEXT:    addi r1, r1, 112
+; P8-NEXT:    addi r1, r1, 144
 ; P8-NEXT:    ld r0, 16(r1)
 ; P8-NEXT:    mtlr r0
 ; P8-NEXT:    blr
 ;
 ; P9-LABEL: test_extend32_vec4:
 ; P9:       # %bb.0:
-; P9-NEXT:    lhz r4, 6(r3)
+; P9-NEXT:    lwz r4, 4(r3)
+; P9-NEXT:    stw r4, -16(r1)
+; P9-NEXT:    lwz r3, 0(r3)
+; P9-NEXT:    lxv vs34, -16(r1)
+; P9-NEXT:    stw r3, -32(r1)
+; P9-NEXT:    li r3, 0
+; P9-NEXT:    lxv vs35, -32(r1)
+; P9-NEXT:    vex...
[truncated]

@github-actions
Copy link

github-actions bot commented Jan 8, 2026

🐧 Linux x64 Test Results

  • 188234 tests passed
  • 4998 tests skipped

✅ The build succeeded and all tests passed.

@github-actions
Copy link

github-actions bot commented Jan 8, 2026

🪟 Windows x64 Test Results

  • 129245 tests passed
  • 2852 tests skipped

✅ The build succeeded and all tests passed.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nikic nikic merged commit db26ce5 into llvm:main Jan 8, 2026
11 checks passed
@tgross35 tgross35 deleted the ppc-soft-promote-half branch January 8, 2026 14:40
kshitijvp pushed a commit to kshitijvp/llvm-project that referenced this pull request Jan 9, 2026
…at` (llvm#152632)

On PowerPC targets, `half` uses the default legalization of promoting to
a `f32`. However, this has some fundamental issues related to inability
to round trip. Resolve this by switching to the soft legalization, which
passes `f16` as an `i16`.

The PowerPC ABI Specification does not define a `_Float16` type, so the
calling convention changes are acceptable.

Fixes the PowerPC part of
llvm#97975
Fixes the PowerPC part of
llvm#97981
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants