-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vectorization of linalg.fill #1095
base: main
Are you sure you want to change the base?
Conversation
9c5d008
to
14e0dbb
Compare
… ops (#1117) This is part of the PR to vectorize linalg.fill: #1095 Basically one of the patterns introduced in #1095 means that in one of the subsequent passes (lowering to llvm dialect) a cast operation is introduced outside of an `aie.core`, which needs to be inside the aie.core for core-to-standard to work. i.e. we need to sink an operation into an aie.core. Before this PR, there is already a pass to sink operations into `amdaie.core`. This PR refactors that pass so that it can be reused to sink into `aie.core` (or any other regioned op).
14e0dbb
to
9c12e38
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice % few comments.
compiler/plugins/target/AMD-AIE/aievec/VectorToVectorConversions.cpp
Outdated
Show resolved
Hide resolved
compiler/plugins/target/AMD-AIE/aievec/VectorToVectorConversions.cpp
Outdated
Show resolved
Hide resolved
compiler/plugins/target/AMD-AIE/aievec/VectorToVectorConversions.cpp
Outdated
Show resolved
Hide resolved
compiler/plugins/target/AMD-AIE/aievec/VectorToVectorConversions.cpp
Outdated
Show resolved
Hide resolved
assert(initialVectorType && "vector must be of vector type"); | ||
assert(writeDestinationType.getElementType() == | ||
initialVectorType.getElementType() && | ||
"element types must match"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my understanding : Why aren't these a candidate for returning match failure instead of asserting ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, a match failure is for ops that don't seem 'broken', they just don't fit a particular pattern. But I think a transfer_read that doesn't satisfy the checks here suggests something is very wrong, and we/user should take action to assess the situation. If this wasn't a pattern-based pass, I'd probably signalPassFailure()
, but that's not an option with the pattern based approach.
compiler/plugins/target/AMD-AIE/aievec/VectorToVectorConversions.cpp
Outdated
Show resolved
Hide resolved
compiler/plugins/target/AMD-AIE/aievec/VectorToVectorConversions.cpp
Outdated
Show resolved
Hide resolved
compiler/plugins/target/AMD-AIE/aievec/test/canonicalize_transfer_write_for_load.mlir
Outdated
Show resolved
Hide resolved
@@ -65,41 +66,37 @@ void AMDAIEVectorizationPass::runOnOperation() { | |||
SmallVector<Operation *> candidates; | |||
funcOp.walk([&](Operation *op) { | |||
// Only vectorize linalg ops (for now) | |||
if (!isa<linalg::LinalgOp>(op)) return; | |||
if (!isa<linalg::LinalgOp>(op)) return WalkResult::advance(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and elsewhere : WalkResult::skip()
perhaps ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, it might work (and would be better because more efficient in theory). But I'd prefer to play with this in another PR -- I'd like to try and get vectorization working for all of these ops eventually.
691a16e
to
d9f51aa
Compare
arith::ConstantOp constantVectorSource = [&writeOp]() -> arith::ConstantOp { | ||
Value current = writeOp.getVector(); | ||
while (Operation *op = current.getDefiningOp()) { | ||
if (auto cOp = dyn_cast<arith::ConstantOp>(op)) return cOp; | ||
if (op->getNumOperands() != 1) return {}; | ||
current = op->getOperand(0); | ||
} | ||
return {}; | ||
}(); | ||
if (!constantVectorSource) { | ||
return rewriter.notifyMatchFailure( | ||
writeOp, "vector isn't derived from arith.constant"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment on maybeSplat
applies here as well. No need to define such inlined functions when it's really just invoked once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change it, but I actually find the approach with lambdas clearer to read: (1) my eye can skip to end of function if it's not interested in it's details of the traversal (2) name encapsulation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, done! It does look nice now.
a5680a7
to
8bc90eb
Compare
8bc90eb
to
f696e6c
Compare
This PR contains multiple changes to get vectorized assembly through peano, I will split it into multiple PRs.
Eyeballing the first test in the performance benchmark:
Before:
After:
So a nice saving on program memory, and maybe a marginal throughput boost. There a consistent saving of 1K memory across all (non-ukernel) benchmarks