Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DA] remove wrap-around check from affine definition #116632

Merged
merged 1 commit into from
Feb 22, 2025
Merged

Conversation

sebpop
Copy link
Contributor

@sebpop sebpop commented Nov 18, 2024

DA uses SCEV to solve linear constraints. When it generates SCEVs with negative strides, i.e., {0,+,-1}, make sure the SCEVs are marked as non wrap arithmetic. This patch fixes #51512

@llvmbot
Copy link
Member

llvmbot commented Nov 18, 2024

@llvm/pr-subscribers-testing-tools

@llvm/pr-subscribers-llvm-analysis

Author: Sebastian Pop (sebpop)

Changes

DA uses SCEV to solve linear constraints. When it generates SCEVs with negative strides, i.e., {0,+,-1}, make sure the SCEVs are marked as non wrap arithmetic. This patch fixes #51512


Full diff: https://github.com/llvm/llvm-project/pull/116632.diff

2 Files Affected:

  • (modified) llvm/lib/Analysis/DependenceAnalysis.cpp (+4-6)
  • (added) llvm/test/Analysis/DependenceAnalysis/PR51512.ll (+34)
diff --git a/llvm/lib/Analysis/DependenceAnalysis.cpp b/llvm/lib/Analysis/DependenceAnalysis.cpp
index a4a98ea0bae146..f08e4325a0bc75 100644
--- a/llvm/lib/Analysis/DependenceAnalysis.cpp
+++ b/llvm/lib/Analysis/DependenceAnalysis.cpp
@@ -3116,11 +3116,9 @@ const SCEV *DependenceInfo::addToCoefficient(const SCEV *Expr,
                                              const Loop *TargetLoop,
                                              const SCEV *Value) const {
   const SCEVAddRecExpr *AddRec = dyn_cast<SCEVAddRecExpr>(Expr);
-  if (!AddRec) // create a new addRec
-    return SE->getAddRecExpr(Expr,
-                             Value,
-                             TargetLoop,
-                             SCEV::FlagAnyWrap); // Worst case, with no info.
+  if (!AddRec)
+    return SE->getAddRecExpr(Expr, Value, TargetLoop, SCEV::FlagNSW);
+
   if (AddRec->getLoop() == TargetLoop) {
     const SCEV *Sum = SE->getAddExpr(AddRec->getStepRecurrence(*SE), Value);
     if (Sum->isZero())
@@ -3131,7 +3129,7 @@ const SCEV *DependenceInfo::addToCoefficient(const SCEV *Expr,
                              AddRec->getNoWrapFlags());
   }
   if (SE->isLoopInvariant(AddRec, TargetLoop))
-    return SE->getAddRecExpr(AddRec, Value, TargetLoop, SCEV::FlagAnyWrap);
+    return SE->getAddRecExpr(AddRec, Value, TargetLoop, SCEV::FlagNSW);
   return SE->getAddRecExpr(
       addToCoefficient(AddRec->getStart(), TargetLoop, Value),
       AddRec->getStepRecurrence(*SE), AddRec->getLoop(),
diff --git a/llvm/test/Analysis/DependenceAnalysis/PR51512.ll b/llvm/test/Analysis/DependenceAnalysis/PR51512.ll
new file mode 100644
index 00000000000000..52419adff8f8b5
--- /dev/null
+++ b/llvm/test/Analysis/DependenceAnalysis/PR51512.ll
@@ -0,0 +1,34 @@
+; RUN: opt < %s -disable-output "-passes=print<da>" -aa-pipeline=basic-aa
+
+; Check that the testcase does not crash the compiler.
+; https://github.com/llvm/llvm-project/issues/51512
+
+define void @func_1() {
+entry:
+  %l_83.i.i = alloca [2 x [5 x i32]], align 1
+  br label %for.cond857.preheader.i.i
+
+for.cond857.preheader.i.i:                        ; preds = %cleanup.cont1138.i.i, %entry
+  %l_89.08.i.i = phi i32 [ 0, %entry ], [ %add1140.i.i, %cleanup.cont1138.i.i ]
+  %0 = trunc i32 %l_89.08.i.i to i16
+  %1 = add i16 %0, 3
+  %arrayidx916.i.i = getelementptr inbounds [2 x [5 x i32]], [2 x [5 x i32]]* %l_83.i.i, i16 0, i16 %0, i16 %1
+  br label %for.body860.i.i
+
+for.body860.i.i:                                  ; preds = %for.body860.i.i, %for.cond857.preheader.i.i
+  %l_74.07.i.i = phi i32 [ 0, %for.cond857.preheader.i.i ], [ %add964.i.i, %for.body860.i.i ]
+  store i32 undef, i32* %arrayidx916.i.i, align 1
+  %2 = trunc i32 %l_74.07.i.i to i16
+  %arrayidx962.i.i = getelementptr inbounds [2 x [5 x i32]], [2 x [5 x i32]]* %l_83.i.i, i16 0, i16 %2, i16 %1
+  store i32 0, i32* %arrayidx962.i.i, align 1
+  %add964.i.i = add nuw nsw i32 %l_74.07.i.i, 1
+  br i1 false, label %for.body860.i.i, label %cleanup.cont1138.i.i
+
+cleanup.cont1138.i.i:                             ; preds = %for.body860.i.i
+  %add1140.i.i = add nuw nsw i32 %l_89.08.i.i, 1
+  %cmp602.i.i = icmp eq i32 %l_89.08.i.i, 0
+  br i1 %cmp602.i.i, label %for.cond857.preheader.i.i, label %for.cond1480.i.i.preheader
+
+for.cond1480.i.i.preheader:                       ; preds = %cleanup.cont1138.i.i
+  unreachable
+}

TargetLoop,
SCEV::FlagAnyWrap); // Worst case, with no info.
if (!AddRec)
return SE->getAddRecExpr(Expr, Value, TargetLoop, SCEV::FlagNSW);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it safe to assume NSW without any additional information here?

Also, I think as @nikic mentioned, at the moment, we would need to prove that NSW holds in all contexts where the new AddRec is valid

There's #91961 to extend SCEV to allow constructing SCEV's with context-sensitive wrapping flags.

Copy link
Contributor Author

@sebpop sebpop Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it safe to assume NSW without any additional information here?

Here's the additional info.
The AddRec created here is only used in the dependence test. It does not represent the evolution of a scalar variable from the compiled code. The AddRec is used to represent the remaining dependence relation after successive projections. The intended semantics is that of a linear function (i.e., a function that does not wrap around.) Adding wrap-around semantics to the SCEV makes the relation non-linear. classifyPair then return Subscript::NonLinear; and this breaks the dependence test as the non-linear access comes as a surprise this late in the test: all non-linear memory access functions have been filtered out by this time.

we would need to prove that NSW holds in all contexts where the new AddRec is valid

The AddRec created in here is only valid in the data dependence test.
The AddRec is used as a representation of the dependence relation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all non-linear memory access functions have been filtered out by this time.

Where does this happen? If it already ensured that no wrap happens it should be safe.

If I am nitpicky, if the original offset expression is NSW, it would be still possible that intermediate represetions use for dependencies could wrap. E.g. two loops, one with large prime stride p, the other with large prime stride q. They have a theoretical dependence at p*q which might be outside integer precision.

Consider adding a comment into the code why NSW is OK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Michael for your review.
I updated the commit message to explain that NSW is used to have the semantics of linear function with no wrap around:

DA uses SCEVs to represent linear constraints on memory accesses.
addToCoefficient is only called from the "Constraint Propagation engine":
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Analysis/DependenceAnalysis.cpp#L3877

The propagation engine works on MIV subscripts.
MIV subscripts are pairs of linear memory accesses that vary in multiple loops.
At this point all memory accesses are linear, non-linear have been filtered out.

When addToCoefficient creates an AddRec with flags SCEV::FlagAnyWrap,
classifyPair returns llvm::DependenceInfo::Subscript::NonLinear.
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Analysis/DependenceAnalysis.cpp#L3882

Having a non-linear effect this late in the dependence test is surprising, and
the test fails with llvm_unreachable("bad subscript classification").

Instead of building non-linear functions, this patch modifies the flags to
SCEV::FlagNSW that has the meaning of linear functions that do not wrap around.

I also added a comment to the code:

// When Expr is invariant in TargetLoop, i.e., the stride in TargetLoop is 0,
// addToCoefficient returns an AddRec {Expr, +, Value}_TargetLoop.
// NoWrapFlags is set to FlagNSW for the evolution function to be linear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NoWrapFlags is set to FlagNSW for the evolution function to be linear.

The NSW flag doesn't make it linear in the mathematical sense, which is impossible on a finite domain. The SCEV still has a type limited to some integer bitwidth, and ScalarEvolution makes use of the fact, e.g. when simplifying expressions.

But if the math behind MIV subscripts are based on linear(/affine) functions on $\mathbb{R}^n \to \mathbb{R}$ with unbounded range (red race car book, p129) means that any wrapping behaviour of $a^+l$ and $a^-u$ would also not be correct indicated by the "bad subscript classification", maybe even less so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the check for affine happens at the beginning of the dependence test, and then during the dep test.
dep test crashes because new non-affine AddRecs are added to the constraints.
The patch does modify the 2 places that build the new AddRecs to follow the definition of affine in checkSubscript.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code defining "affine" was added as a fix here: c0661ae

Copy link
Contributor Author

@sebpop sebpop Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both tests pass after I partial revert c0661ae the check for wrap flags.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (!AddRec->getNoWrapFlags()) passes for any wrap flag, including NW and NUW, not necessarily NSW. So this might re-interpret an unsigned integer as a signed one?

The latest push doesn't even unclude setting SCEV::FlagNSW anymore. Intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the change is intended. The last push commit message:

Revert part of c0661ae
that modified the definition of what an affine subscript is.

The code added to checkSubscripts for bug #21959 is not needed.

sjoerdmeijer added a commit to sjoerdmeijer/llvm-project that referenced this pull request Jan 29, 2025
This is a work in progress patch to enable loop-interchange by default
and is a continuation of the RFC:

https://discourse.llvm.org/t/enabling-loop-interchange/82589

Basically, we promised to fix any compile-time and correctness issues in
the different components involved here (loop-interchange and dependence
analaysis.) before discussing enabling interchange by default. We think
are close to complete this; I would like to explain where we are and
wanted to check if there are any thoughts or concerns. A quick overview
of the correctness and compile-time improvements that we have made include:

Correctness:
- [LoopInterchange] Remove 'S' Scalar Dependencies (llvm#119345)
- [LoopInterchange] Fix overflow in cost calculation (llvm#111807)
- [LoopInterchange] Handle LE and GE correctly (PR llvm#124901) @kasuga-fj
- [DA] disambiguate evolution of base addresses (llvm#116628)

Compile-times:
- [LoopInterchange] Constrain number of load/stores in a loop (llvm#118973)
- [LoopInterchange] Bail out early if minimum loop nest is not met (llvm#115128)
- [LoopInterchange] Hoist isComputableLoopNest() in the control flow (llvm#124247)

And in terms of remaining work, we think we are very close to fixing
these depenence analysis issues:
- [DA] do not handle array accesses of different offsets (llvm#123436)
- [DA] Dependence analysis does not handle array accesses of different sizes (llvm#116630)
- [DA] use NSW arithmetic llvm#116632

The compile-time increase with a geomean increase of 0.19% looks good
(after committing llvm#124247), I think:

  stage1-O3:
  Benchmark
  kimwitu++        +0.10%
  sqlite3          +0.14%
  consumer-typeset +0.07%
  Bullet           +0.06%
  tramp3d-v4       +0.21%
  mafft            +0.39%
  ClamAVi          +0.06%
  lencod           +0.61%
  SPASS            +0.17%
  7zip             +0.08%
  geomean          +0.19%

See also:
http://llvm-compile-time-tracker.com/compare.php?from=19a7fe03b4f58c4f73ea91d5e63bc4c6e61f987b&to=b24f1367d68ee675ea93ecda4939208c6b68ae4b&stat=instructions%3Au

We might want to look into lencod to see if we can improve more, but not
sure it is strictly necessary.
@sebpop sebpop force-pushed the da-4 branch 2 times, most recently from 04ec5a0 to e575d9d Compare February 5, 2025 19:03
@Meinersbur
Copy link
Member

Can you update the description? "make sure the SCEVs are marked as non wrap arithmetic" doesn't match to the current code diff.

For adding NSW, I crafted some example:

for (uint32_t i = 0x02; i < min((uint32_t)n,2); i+= 0x7FFFFFFF) {
   if (i == 0x02)
    A[0x80000001] = 42;
  else if (i == 0x80000001)
    use(A[i]);
}

there should be a RAW dependencies from (i==0 to i==0x80000001). The AddRec for i should be marked as non-self-wrapping (and NUW) due to the restricted trip count. If we create a new AddRec for the subscript of A[i] with NSW, SCEV might the only valid value for i is i==0, because for i==1 the value 0x02 + 1*0x7FFFFFFF is wrapping a signed integer. DependenceInfo therefore might conclude there is no dependency ⚡

@sebpop sebpop changed the title [DA] use NSW arithmetic [DA] remove wrap-around check from affine definition Feb 18, 2025
@sebpop
Copy link
Contributor Author

sebpop commented Feb 20, 2025

@Meinersbur any other comments on this patch?
Ok to merge?

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to update the description explaining the latest version of the fix and why it is safe to remove the check

%alloca = alloca [2 x [5 x i32]], align 1
br label %bb1

bb1: ; preds = %bb7, %bb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to have more descriptive names for the block make the test easier to read, e.g. bb1 -> loop.header,, `bb7 -> loop.latch).

br label %bb1

bb1: ; preds = %bb7, %bb
%phi = phi i32 [ 0, %bb ], [ %add8, %bb7 ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be clearer to use more descriptive names, e..g

Suggested change
%phi = phi i32 [ 0, %bb ], [ %add8, %bb7 ]
%iv = phi i32 [ 0, %bb ], [ %iv.next, %bb7 ]

Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Please also consider @fhahn's comments

Fix llvm#51512
by reverting a part of commit
llvm@c0661ae
that modified the definition affine subscripts.
@sebpop sebpop merged commit 5f8da7e into llvm:main Feb 22, 2025
11 checks passed
@sebpop sebpop deleted the da-4 branch February 22, 2025 00:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants