Use `range` parameter attributes to fold `sub`+`icmp u` into `icmp s` #134028

scottmcm · 2025-04-02T05:05:46Z

(This example comes from looking at Rust's discriminant code from https://rust.godbolt.org/z/1138aGqdb)

Take this input IR:

define noundef zeroext i1 @is_foo(i8 noundef range(i8 -1, 5) %x) unnamed_addr {
start:
  %0 = sub i8 %x, 2
  %1 = zext i8 %0 to i64
  %2 = icmp ule i8 %0, 2
  %3 = add i64 %1, 1
  %_2 = select i1 %2, i64 %3, i64 0
  %_0 = icmp eq i64 %_2, 0
  ret i1 %_0
}

Today, LLVM does simplify it a bunch, getting it down to https://llvm.godbolt.org/z/G7as7Y6o5

define noundef zeroext i1 @is_foo(i8 noundef range(i8 -1, 5) %x) unnamed_addr #0 {
  %0 = add nsw i8 %x, -5
  %1 = icmp ult i8 %0, -3
  ret i1 %1
}

It could do better, though. The range information was used to determine the nsw, but that doesn't really help the unsigned icmp.

Specifically, the range restriction is enough that it'd be allowed to be just https://alive2.llvm.org/ce/z/fkVEwL

define noundef zeroext i1 @is_foo(i8 noundef range(i8 -1, 5) %x) unnamed_addr #0 {
start:
  %1 = icmp slt i8 %x, 2
  ret i1 %1
}

Eliminating the need for the add altogether.

The text was updated successfully, but these errors were encountered:

scottmcm · 2025-04-02T05:49:06Z

Come to think of it, another way to get to the right place would be to look at the

define noundef zeroext i1 @src(i8 noundef range(i8 -1, 5) %x) unnamed_addr #0 {
start:
  %0 = add nsw i8 %x, -5
  %1 = icmp ult i8 %0, -3
  ret i1 %1
}

And use the range information to notice %0 is strictly negative, and thus change it to https://alive2.llvm.org/ce/z/tWujha

define noundef zeroext i1 @tgt(i8 noundef range(i8 -1, 5) %x) unnamed_addr #0 {
start:
  %0 = add nsw i8 %x, -5
  %1 = icmp samesign slt i8 %0, -3
  ret i1 %1
}

Which would then InstCombine down to the desired icmp slt i8 %s, 2: https://llvm.godbolt.org/z/ET7Kx4nr5

dtcxzyw · 2025-04-02T09:04:22Z

We can split this into two PRs:

Infer samesign using range info. However, I don't recommend doing this in InstCombine, as it has been handled by CVP.
Handle icmp samesign ult (x +nsw 5), -3 -> icmp slt x, 2 in InstCombinerImpl::foldICmpAddConstant.

llvmbot · 2025-04-02T09:04:56Z

Hi!

This issue may be a good introductory issue for people new to working on LLVM. If you would like to work on this issue, your first steps are:

Check that no other contributor has already been assigned to this issue. If you believe that no one is actually working on it despite an assignment, ping the person. After one week without a response, the assignee may be changed.
In the comments of this issue, request for it to be assigned to you, or just create a pull request after following the steps below. Mention this issue in the description of the pull request.
Fix the issue locally.
Run the test suite locally. Remember that the subdirectories under test/ create fine-grained testing targets, so you can e.g. use make check-clang-ast to only run Clang's AST tests.
Create a Git commit.
Run git clang-format HEAD~1 to format your changes.
Open a pull request to the upstream repository on GitHub. Detailed instructions can be found in GitHub's documentation. Mention this issue in the description of the pull request.

If you have any further questions about this issue, don't hesitate to ask via a comment in the thread below.

llvmbot · 2025-04-02T09:04:57Z

@llvm/issue-subscribers-good-first-issue

Author: None (scottmcm)

(This example comes from looking at Rust's discriminant code from <https://rust.godbolt.org/z/1138aGqdb>)

Take this input IR:

define noundef zeroext i1 @<!-- -->is_foo(i8 noundef range(i8 -1, 5) %x) unnamed_addr {
start:
  %0 = sub i8 %x, 2
  %1 = zext i8 %0 to i64
  %2 = icmp ule i8 %0, 2
  %3 = add i64 %1, 1
  %_2 = select i1 %2, i64 %3, i64 0
  %_0 = icmp eq i64 %_2, 0
  ret i1 %_0
}

Today, LLVM does simplify it a bunch, getting it down to <https://llvm.godbolt.org/z/G7as7Y6o5>

define noundef zeroext i1 @<!-- -->is_foo(i8 noundef range(i8 -1, 5) %x) unnamed_addr #<!-- -->0 {
  %0 = add nsw i8 %x, -5
  %1 = icmp ult i8 %0, -3
  ret i1 %1
}

It could do better, though. The range information was used to determine the nsw, but that doesn't really help the unsigned icmp.

Specifically, the range restriction is enough that it'd be allowed to be just <https://alive2.llvm.org/ce/z/fkVEwL>

define noundef zeroext i1 @<!-- -->is_foo(i8 noundef range(i8 -1, 5) %x) unnamed_addr #<!-- -->0 {
start:
  %1 = icmp slt i8 %x, 2
  ret i1 %1
}

Eliminating the need for the add altogether.

RAJAGOPALAN-GANGADHARAN · 2025-04-02T12:14:06Z

@scottmcm @dtcxzyw This looks like an interesting issue, can I give it a shot? Thanks!

dtcxzyw · 2025-04-02T12:20:03Z

@RAJAGOPALAN-GANGADHARAN Please read https://llvm.org/docs/InstCombineContributorGuide.html before submitting a patch. Good luck :)

RAJAGOPALAN-GANGADHARAN · 2025-04-02T18:03:17Z

@dtcxzyw Thanks for assigning the issue. So I spent couple of hours trying to understand whats going on and I have some questions regarding the first part:

Infer samesign using range info. However, I don't recommend doing this in InstCombine, as it has been handled by CVP

I can see that samesign is added only at O2 or higher, While the issue has used O1. Is the expectation to make this work at O1? From what I understand that is not the right expectation because all the cmp optimizations are done at O2 level.

scottmcm · 2025-04-02T18:16:23Z

Ah, the -O1 here is my mistake. I'd been using that to avoid vectorization in #134024, but I'm fine if this is only fixed at O2.

That said, if I take the repro and change it to O3, I do see that it gets the samesign https://llvm.godbolt.org/z/Pj1W6vfM8 but it still doesn't simplify down to just an icmp -- the add is still there.

~~So maybe its an awkward phase ordering problem? :/ ~~

Hmm, there is an InstCombine after CVP https://llvm.godbolt.org/z/xdjMGj81c. But somehow it is not already doing this, even though in https://llvm.godbolt.org/z/ET7Kx4nr5 its InstCombine that removes the add.

~~I have no idea what is going on here.~~

Oh, I failed to read my own example. Right, InstCombine does fix it when its slt, but its not made into slt yet.

RAJAGOPALAN-GANGADHARAN · 2025-04-02T18:19:18Z

@scottmcm , If I swap the ult to slt while we add the samesign, I can see the optimizer is folding it to icmp slt automatically. This seems like a good fix for me, essentially im checking if both passed numbers are negative -> if yes swap the "u*" with "s*"

Let me know what do you think about this? Thanks!

scottmcm · 2025-04-02T18:26:49Z

For any specifics about how to handle thing definitely trust what @dtcxzyw has to say over me, because I do not understand the LLVM internals at any meaningful level. I just look at IR and run Alive2 :P

It does sound at least plausible to me to flip to a signed operation if we know that the sign bit is set. That said, I do not know if the usual canonicalization is to unsigned ops or signed ones when both work. It could potentially also be that the pattern should look for the add+icmp and change that. Or perhaps that some signal other than both sides being negative would be better -- for example maybe the reason to switch from ult to slt is not that its negative, but that one of the inputs is nsw. I dont know.

It looks like #134028 (comment) suggested that it be a fold on the add+icmp specifically, so you should probably stick with that unless there is a strong reason to do otherwise.

dtcxzyw · 2025-04-03T02:20:47Z

@scottmcm , If I swap the ult to slt while we add the samesign, I can see the optimizer is folding it to icmp slt automatically. This seems like a good fix for me, essentially im checking if both passed numbers are negative -> if yes swap the "u*" with "s*"

Let me know what do you think about this? Thanks!

InstCombine pass canonicalizes icmp slt into icmp samesign ult if possible. Inverse transform is not allowed because it will lead to infinite loops.

RAJAGOPALAN-GANGADHARAN · 2025-04-06T08:03:52Z

@dtcxzyw If icmp slt is already getting canonicalized to icmp samesign ult, when should the inverse be done - Probably only when both numbers are negative? Sorry for the dumb questions!

dtcxzyw · 2025-04-06T08:29:59Z

@dtcxzyw If icmp slt is already getting canonicalized to icmp samesign ult, when should the inverse be done - Probably only when both numbers are negative? Sorry for the dumb questions!

Generally, we should never invert the canonicalization in IR. But we can temporarily refine samesign ult into slt in the pattern matching logic.

In this case, the expected transformation happens at:

llvm-project/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp

Lines 3155 to 3171 in 53e1c8b

    
           // If the add does not wrap, we can always adjust the compare by subtracting 
        
           // the constants. Equality comparisons are handled elsewhere. SGE/SLE/UGE/ULE 
        
           // are canonicalized to SGT/SLT/UGT/ULT. 
        
           if ((Add->hasNoSignedWrap() && 
        
                (Pred == ICmpInst::ICMP_SGT || Pred == ICmpInst::ICMP_SLT)) || 
        
               (Add->hasNoUnsignedWrap() && 
        
                (Pred == ICmpInst::ICMP_UGT || Pred == ICmpInst::ICMP_ULT))) { 
        
             bool Overflow; 
        
             APInt NewC = 
        
                 Cmp.isSigned() ? C.ssub_ov(*C2, Overflow) : C.usub_ov(*C2, Overflow); 
        
             // If there is overflow, the result must be true or false. 
        
             // TODO: Can we assert there is no overflow because InstSimplify always 
        
             // handles those cases? 
        
             if (!Overflow) 
        
               // icmp Pred (add nsw X, C2), C --> icmp Pred X, (C - C2) 
        
               return new ICmpInst(Pred, X, ConstantInt::get(Ty, NewC)); 
        
           }

The current implementation only expects a slt/sgt if the add has NSW. To make it accept samesign ult/ugt as well, we should change the type of Pred into CmpPredicate to preserve the samesign flag, then use getPreferredSignedPredicate to get the refined predicate:

// If the add does not wrap, we can always adjust the compare by subtracting 
 // the constants. Equality comparisons are handled elsewhere. SGE/SLE/UGE/ULE 
 // are canonicalized to SGT/SLT/UGT/ULT. 
 if ((Add->hasNoSignedWrap() && 
      (Pred.getPreferredSignedPredicate() == ICmpInst::ICMP_SGT || Pred.getPreferredSignedPredicate() == ICmpInst::ICMP_SLT)) || 
     (Add->hasNoUnsignedWrap() && 
      (Pred == ICmpInst::ICMP_UGT || Pred == ICmpInst::ICMP_ULT))) {

RAJAGOPALAN-GANGADHARAN · 2025-04-06T17:41:54Z

@dtcxzyw Thanks for the pointers :) , I did find this earlier. I was more interested in what cases would denote the switch from slt to samesign ult and vice versa. Now I think I have some idea on why these optimizations are done, essentially in this case there is a comparison between a partially negative range and always positive constant. This means it is a signed comparison thereby overflow is UB, and we are free to optimize on this. Since we are sure atleast one side needs signed representation, this kind of fold is expected.

llvmbot added the new issue label Apr 2, 2025

scottmcm mentioned this issue Apr 2, 2025

Match on two-variant enum optimizes poorly rust-lang/rust#122734

Open

dtcxzyw added good first issue https://github.com/llvm/llvm-project/contribute llvm:instcombine missed-optimization and removed new issue labels Apr 2, 2025

dtcxzyw assigned RAJAGOPALAN-GANGADHARAN Apr 2, 2025

dtcxzyw linked a pull request Apr 7, 2025 that will close this issue

InstCombine: Fold samesign ult to slt with added constant when the range is known #134556

Open

scottmcm mentioned this issue Apr 7, 2025

Tell LLVM about impossible niche tags rust-lang/rust#139098

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `range` parameter attributes to fold `sub`+`icmp u` into `icmp s` #134028

Use `range` parameter attributes to fold `sub`+`icmp u` into `icmp s` #134028

scottmcm commented Apr 2, 2025

scottmcm commented Apr 2, 2025

dtcxzyw commented Apr 2, 2025

llvmbot commented Apr 2, 2025

llvmbot commented Apr 2, 2025

RAJAGOPALAN-GANGADHARAN commented Apr 2, 2025

dtcxzyw commented Apr 2, 2025

RAJAGOPALAN-GANGADHARAN commented Apr 2, 2025

scottmcm commented Apr 2, 2025 •

edited

Loading

RAJAGOPALAN-GANGADHARAN commented Apr 2, 2025 •

edited

Loading

scottmcm commented Apr 2, 2025 •

edited

Loading

dtcxzyw commented Apr 3, 2025

RAJAGOPALAN-GANGADHARAN commented Apr 6, 2025 •

edited

Loading

dtcxzyw commented Apr 6, 2025

RAJAGOPALAN-GANGADHARAN commented Apr 6, 2025

Use range parameter attributes to fold sub+icmp u* into icmp s* #134028

Use range parameter attributes to fold sub+icmp u* into icmp s* #134028

Comments

scottmcm commented Apr 2, 2025

scottmcm commented Apr 2, 2025

dtcxzyw commented Apr 2, 2025

llvmbot commented Apr 2, 2025

llvmbot commented Apr 2, 2025

RAJAGOPALAN-GANGADHARAN commented Apr 2, 2025

dtcxzyw commented Apr 2, 2025

RAJAGOPALAN-GANGADHARAN commented Apr 2, 2025

scottmcm commented Apr 2, 2025 • edited Loading

RAJAGOPALAN-GANGADHARAN commented Apr 2, 2025 • edited Loading

scottmcm commented Apr 2, 2025 • edited Loading

dtcxzyw commented Apr 3, 2025

RAJAGOPALAN-GANGADHARAN commented Apr 6, 2025 • edited Loading

dtcxzyw commented Apr 6, 2025

RAJAGOPALAN-GANGADHARAN commented Apr 6, 2025

Use `range` parameter attributes to fold `sub`+`icmp u` into `icmp s` #134028

Use `range` parameter attributes to fold `sub`+`icmp u` into `icmp s` #134028

scottmcm commented Apr 2, 2025 •

edited

Loading

RAJAGOPALAN-GANGADHARAN commented Apr 2, 2025 •

edited

Loading

scottmcm commented Apr 2, 2025 •

edited

Loading

RAJAGOPALAN-GANGADHARAN commented Apr 6, 2025 •

edited

Loading