Skip to content

Use range parameter attributes to fold sub+icmp u* into icmp s* #134028

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
scottmcm opened this issue Apr 2, 2025 · 14 comments · May be fixed by #134556
Open

Use range parameter attributes to fold sub+icmp u* into icmp s* #134028

scottmcm opened this issue Apr 2, 2025 · 14 comments · May be fixed by #134556
Assignees
Labels
good first issue https://github.com/llvm/llvm-project/contribute llvm:instcombine missed-optimization

Comments

@scottmcm
Copy link

scottmcm commented Apr 2, 2025

(This example comes from looking at Rust's discriminant code from https://rust.godbolt.org/z/1138aGqdb)

Take this input IR:

define noundef zeroext i1 @is_foo(i8 noundef range(i8 -1, 5) %x) unnamed_addr {
start:
  %0 = sub i8 %x, 2
  %1 = zext i8 %0 to i64
  %2 = icmp ule i8 %0, 2
  %3 = add i64 %1, 1
  %_2 = select i1 %2, i64 %3, i64 0
  %_0 = icmp eq i64 %_2, 0
  ret i1 %_0
}

Today, LLVM does simplify it a bunch, getting it down to https://llvm.godbolt.org/z/G7as7Y6o5

define noundef zeroext i1 @is_foo(i8 noundef range(i8 -1, 5) %x) unnamed_addr #0 {
  %0 = add nsw i8 %x, -5
  %1 = icmp ult i8 %0, -3
  ret i1 %1
}

It could do better, though. The range information was used to determine the nsw, but that doesn't really help the unsigned icmp.

Specifically, the range restriction is enough that it'd be allowed to be just https://alive2.llvm.org/ce/z/fkVEwL

define noundef zeroext i1 @is_foo(i8 noundef range(i8 -1, 5) %x) unnamed_addr #0 {
start:
  %1 = icmp slt i8 %x, 2
  ret i1 %1
}

Eliminating the need for the add altogether.

@scottmcm
Copy link
Author

scottmcm commented Apr 2, 2025

Come to think of it, another way to get to the right place would be to look at the

define noundef zeroext i1 @src(i8 noundef range(i8 -1, 5) %x) unnamed_addr #0 {
start:
  %0 = add nsw i8 %x, -5
  %1 = icmp ult i8 %0, -3
  ret i1 %1
}

And use the range information to notice %0 is strictly negative, and thus change it to https://alive2.llvm.org/ce/z/tWujha

define noundef zeroext i1 @tgt(i8 noundef range(i8 -1, 5) %x) unnamed_addr #0 {
start:
  %0 = add nsw i8 %x, -5
  %1 = icmp samesign slt i8 %0, -3
  ret i1 %1
}

Which would then InstCombine down to the desired icmp slt i8 %s, 2: https://llvm.godbolt.org/z/ET7Kx4nr5

@dtcxzyw
Copy link
Member

dtcxzyw commented Apr 2, 2025

We can split this into two PRs:

  1. Infer samesign using range info. However, I don't recommend doing this in InstCombine, as it has been handled by CVP.
  2. Handle icmp samesign ult (x +nsw 5), -3 -> icmp slt x, 2 in InstCombinerImpl::foldICmpAddConstant.

@dtcxzyw dtcxzyw added good first issue https://github.com/llvm/llvm-project/contribute llvm:instcombine missed-optimization and removed new issue labels Apr 2, 2025
@llvmbot
Copy link
Member

llvmbot commented Apr 2, 2025

Hi!

This issue may be a good introductory issue for people new to working on LLVM. If you would like to work on this issue, your first steps are:

  1. Check that no other contributor has already been assigned to this issue. If you believe that no one is actually working on it despite an assignment, ping the person. After one week without a response, the assignee may be changed.
  2. In the comments of this issue, request for it to be assigned to you, or just create a pull request after following the steps below. Mention this issue in the description of the pull request.
  3. Fix the issue locally.
  4. Run the test suite locally. Remember that the subdirectories under test/ create fine-grained testing targets, so you can e.g. use make check-clang-ast to only run Clang's AST tests.
  5. Create a Git commit.
  6. Run git clang-format HEAD~1 to format your changes.
  7. Open a pull request to the upstream repository on GitHub. Detailed instructions can be found in GitHub's documentation. Mention this issue in the description of the pull request.

If you have any further questions about this issue, don't hesitate to ask via a comment in the thread below.

@llvmbot
Copy link
Member

llvmbot commented Apr 2, 2025

@llvm/issue-subscribers-good-first-issue

Author: None (scottmcm)

(This example comes from looking at Rust's discriminant code from <https://rust.godbolt.org/z/1138aGqdb>)

Take this input IR:

define noundef zeroext i1 @<!-- -->is_foo(i8 noundef range(i8 -1, 5) %x) unnamed_addr {
start:
  %0 = sub i8 %x, 2
  %1 = zext i8 %0 to i64
  %2 = icmp ule i8 %0, 2
  %3 = add i64 %1, 1
  %_2 = select i1 %2, i64 %3, i64 0
  %_0 = icmp eq i64 %_2, 0
  ret i1 %_0
}

Today, LLVM does simplify it a bunch, getting it down to <https://llvm.godbolt.org/z/G7as7Y6o5>

define noundef zeroext i1 @<!-- -->is_foo(i8 noundef range(i8 -1, 5) %x) unnamed_addr #<!-- -->0 {
  %0 = add nsw i8 %x, -5
  %1 = icmp ult i8 %0, -3
  ret i1 %1
}

It could do better, though. The range information was used to determine the nsw, but that doesn't really help the unsigned icmp.

Specifically, the range restriction is enough that it'd be allowed to be just <https://alive2.llvm.org/ce/z/fkVEwL>

define noundef zeroext i1 @<!-- -->is_foo(i8 noundef range(i8 -1, 5) %x) unnamed_addr #<!-- -->0 {
start:
  %1 = icmp slt i8 %x, 2
  ret i1 %1
}

Eliminating the need for the add altogether.

@RAJAGOPALAN-GANGADHARAN
Copy link

@scottmcm @dtcxzyw This looks like an interesting issue, can I give it a shot? Thanks!

@dtcxzyw
Copy link
Member

dtcxzyw commented Apr 2, 2025

@RAJAGOPALAN-GANGADHARAN Please read https://llvm.org/docs/InstCombineContributorGuide.html before submitting a patch. Good luck :)

@RAJAGOPALAN-GANGADHARAN
Copy link

@dtcxzyw Thanks for assigning the issue. So I spent couple of hours trying to understand whats going on and I have some questions regarding the first part:

Infer samesign using range info. However, I don't recommend doing this in InstCombine, as it has been handled by CVP

I can see that samesign is added only at O2 or higher, While the issue has used O1. Is the expectation to make this work at O1? From what I understand that is not the right expectation because all the cmp optimizations are done at O2 level.

@scottmcm
Copy link
Author

scottmcm commented Apr 2, 2025

Ah, the -O1 here is my mistake. I'd been using that to avoid vectorization in #134024, but I'm fine if this is only fixed at O2.

That said, if I take the repro and change it to O3, I do see that it gets the samesign https://llvm.godbolt.org/z/Pj1W6vfM8 but it still doesn't simplify down to just an icmp -- the add is still there.

~~So maybe its an awkward phase ordering problem? :/ ~~

Hmm, there is an InstCombine after CVP https://llvm.godbolt.org/z/xdjMGj81c. But somehow it is not already doing this, even though in https://llvm.godbolt.org/z/ET7Kx4nr5 its InstCombine that removes the add.

I have no idea what is going on here.

Oh, I failed to read my own example. Right, InstCombine does fix it when its slt, but its not made into slt yet.

@RAJAGOPALAN-GANGADHARAN
Copy link

RAJAGOPALAN-GANGADHARAN commented Apr 2, 2025

@scottmcm , If I swap the ult to slt while we add the samesign, I can see the optimizer is folding it to icmp slt automatically. This seems like a good fix for me, essentially im checking if both passed numbers are negative -> if yes swap the "u*" with "s*"

Let me know what do you think about this? Thanks!

@scottmcm
Copy link
Author

scottmcm commented Apr 2, 2025

For any specifics about how to handle thing definitely trust what @dtcxzyw has to say over me, because I do not understand the LLVM internals at any meaningful level. I just look at IR and run Alive2 :P

It does sound at least plausible to me to flip to a signed operation if we know that the sign bit is set. That said, I do not know if the usual canonicalization is to unsigned ops or signed ones when both work. It could potentially also be that the pattern should look for the add+icmp and change that. Or perhaps that some signal other than both sides being negative would be better -- for example maybe the reason to switch from ult to slt is not that its negative, but that one of the inputs is nsw. I dont know.

It looks like #134028 (comment) suggested that it be a fold on the add+icmp specifically, so you should probably stick with that unless there is a strong reason to do otherwise.

@dtcxzyw
Copy link
Member

dtcxzyw commented Apr 3, 2025

@scottmcm , If I swap the ult to slt while we add the samesign, I can see the optimizer is folding it to icmp slt automatically. This seems like a good fix for me, essentially im checking if both passed numbers are negative -> if yes swap the "u*" with "s*"

Let me know what do you think about this? Thanks!

InstCombine pass canonicalizes icmp slt into icmp samesign ult if possible. Inverse transform is not allowed because it will lead to infinite loops.

@RAJAGOPALAN-GANGADHARAN
Copy link

RAJAGOPALAN-GANGADHARAN commented Apr 6, 2025

@dtcxzyw If icmp slt is already getting canonicalized to icmp samesign ult, when should the inverse be done - Probably only when both numbers are negative? Sorry for the dumb questions!

@dtcxzyw
Copy link
Member

dtcxzyw commented Apr 6, 2025

@dtcxzyw If icmp slt is already getting canonicalized to icmp samesign ult, when should the inverse be done - Probably only when both numbers are negative? Sorry for the dumb questions!

Generally, we should never invert the canonicalization in IR. But we can temporarily refine samesign ult into slt in the pattern matching logic.

In this case, the expected transformation happens at:

// If the add does not wrap, we can always adjust the compare by subtracting
// the constants. Equality comparisons are handled elsewhere. SGE/SLE/UGE/ULE
// are canonicalized to SGT/SLT/UGT/ULT.
if ((Add->hasNoSignedWrap() &&
(Pred == ICmpInst::ICMP_SGT || Pred == ICmpInst::ICMP_SLT)) ||
(Add->hasNoUnsignedWrap() &&
(Pred == ICmpInst::ICMP_UGT || Pred == ICmpInst::ICMP_ULT))) {
bool Overflow;
APInt NewC =
Cmp.isSigned() ? C.ssub_ov(*C2, Overflow) : C.usub_ov(*C2, Overflow);
// If there is overflow, the result must be true or false.
// TODO: Can we assert there is no overflow because InstSimplify always
// handles those cases?
if (!Overflow)
// icmp Pred (add nsw X, C2), C --> icmp Pred X, (C - C2)
return new ICmpInst(Pred, X, ConstantInt::get(Ty, NewC));
}

The current implementation only expects a slt/sgt if the add has NSW. To make it accept samesign ult/ugt as well, we should change the type of Pred into CmpPredicate to preserve the samesign flag, then use getPreferredSignedPredicate to get the refined predicate:

// If the add does not wrap, we can always adjust the compare by subtracting 
 // the constants. Equality comparisons are handled elsewhere. SGE/SLE/UGE/ULE 
 // are canonicalized to SGT/SLT/UGT/ULT. 
 if ((Add->hasNoSignedWrap() && 
      (Pred.getPreferredSignedPredicate() == ICmpInst::ICMP_SGT || Pred.getPreferredSignedPredicate() == ICmpInst::ICMP_SLT)) || 
     (Add->hasNoUnsignedWrap() && 
      (Pred == ICmpInst::ICMP_UGT || Pred == ICmpInst::ICMP_ULT))) { 

@RAJAGOPALAN-GANGADHARAN
Copy link

@dtcxzyw Thanks for the pointers :) , I did find this earlier. I was more interested in what cases would denote the switch from slt to samesign ult and vice versa. Now I think I have some idea on why these optimizations are done, essentially in this case there is a comparison between a partially negative range and always positive constant. This means it is a signed comparison thereby overflow is UB, and we are free to optimize on this. Since we are sure atleast one side needs signed representation, this kind of fold is expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue https://github.com/llvm/llvm-project/contribute llvm:instcombine missed-optimization
Projects
None yet
4 participants