Vectorization of linalg.fill #1095

newling · 2025-02-11T17:53:46Z

This PR contains multiple changes to get vectorized assembly through peano, I will split it into multiple PRs.

Eyeballing the first test in the performance benchmark:

Before:

matmul_512_512_4096_bf16_f32_O2_npu1_4col_benchmark
--------------------------------------------------------------------------------------------------
Benchmark                                        Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------------
BM_matmul/process_time/real_time_mean         2673 us         66.3 us            5 items_per_second=374.194/s
BM_matmul/process_time/real_time_median       2668 us         63.1 us            5 items_per_second=374.766/s
BM_matmul/process_time/real_time_stddev       19.3 us         16.7 us            5 items_per_second=2.69245/s
--------------------------------------------------------------------------------------------------
The largest program memory size (read from byte 72 of elf files) is 11184 bytes

After:

matmul_512_512_4096_bf16_f32_O2_npu1_4col_benchmark
--------------------------------------------------------------------------------------------------
Benchmark                                        Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------------
BM_matmul/process_time/real_time_mean         2601 us         48.6 us            5 items_per_second=385.252/s
BM_matmul/process_time/real_time_median       2540 us         39.1 us            5 items_per_second=393.696/s
BM_matmul/process_time/real_time_stddev        132 us         21.0 us            5 items_per_second=18.3503/s
--------------------------------------------------------------------------------------------------
The largest program memory size (read from byte 72 of elf files) is 10112 bytes

So a nice saving on program memory, and maybe a marginal throughput boost. There a consistent saving of 1K memory across all (non-ukernel) benchmarks

… ops (#1117) This is part of the PR to vectorize linalg.fill: #1095 Basically one of the patterns introduced in #1095 means that in one of the subsequent passes (lowering to llvm dialect) a cast operation is introduced outside of an `aie.core`, which needs to be inside the aie.core for core-to-standard to work. i.e. we need to sink an operation into an aie.core. Before this PR, there is already a pass to sink operations into `amdaie.core`. This PR refactors that pass so that it can be reused to sink into `aie.core` (or any other regioned op).

Abhishek-Varma

Nice % few comments.

compiler/plugins/target/AMD-AIE/aievec/VectorToVectorConversions.cpp

Abhishek-Varma · 2025-02-21T05:50:13Z

compiler/plugins/target/AMD-AIE/aievec/VectorToVectorConversions.cpp

+    assert(initialVectorType && "vector must be of vector type");
+    assert(writeDestinationType.getElementType() ==
+               initialVectorType.getElementType() &&
+           "element types must match");


For my understanding : Why aren't these a candidate for returning match failure instead of asserting ?

IMO, a match failure is for ops that don't seem 'broken', they just don't fit a particular pattern. But I think a transfer_read that doesn't satisfy the checks here suggests something is very wrong, and we/user should take action to assess the situation. If this wasn't a pattern-based pass, I'd probably signalPassFailure(), but that's not an option with the pattern based approach.

compiler/plugins/target/AMD-AIE/aievec/VectorToVectorConversions.cpp

compiler/plugins/target/AMD-AIE/aievec/test/CMakeLists.txt

compiler/plugins/target/AMD-AIE/aievec/test/canonicalize_transfer_write_for_load.mlir

Abhishek-Varma · 2025-02-21T06:10:53Z

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/AMDAIEVectorization.cpp

@@ -65,41 +66,37 @@ void AMDAIEVectorizationPass::runOnOperation() {
  SmallVector<Operation *> candidates;
  funcOp.walk([&](Operation *op) {
    // Only vectorize linalg ops (for now)
-    if (!isa<linalg::LinalgOp>(op)) return;
+    if (!isa<linalg::LinalgOp>(op)) return WalkResult::advance();


Here and elsewhere : WalkResult::skip() perhaps ?

I'm not sure, it might work (and would be better because more efficient in theory). But I'd prefer to play with this in another PR -- I'd like to try and get vectorization working for all of these ops eventually.

Abhishek-Varma · 2025-02-21T17:42:48Z

compiler/plugins/target/AMD-AIE/aievec/VectorToVectorConversions.cpp

+    arith::ConstantOp constantVectorSource = [&writeOp]() -> arith::ConstantOp {
+      Value current = writeOp.getVector();
+      while (Operation *op = current.getDefiningOp()) {
+        if (auto cOp = dyn_cast<arith::ConstantOp>(op)) return cOp;
+        if (op->getNumOperands() != 1) return {};
+        current = op->getOperand(0);
+      }
+      return {};
+    }();
+    if (!constantVectorSource) {
+      return rewriter.notifyMatchFailure(
+          writeOp, "vector isn't derived from arith.constant");
+    }


The comment on maybeSplat applies here as well. No need to define such inlined functions when it's really just invoked once.

I'll change it, but I actually find the approach with lambdas clearer to read: (1) my eye can skip to end of function if it's not interested in it's details of the traversal (2) name encapsulation.

Ok, done! It does look nice now.

newling mentioned this pull request Feb 11, 2025

[UKernel] Vectorize zeroization ukernel for npu4 #1094

Merged

newling force-pushed the towards_vectorized_fill branch 2 times, most recently from 9c5d008 to 14e0dbb Compare February 12, 2025 00:10

newling mentioned this pull request Feb 19, 2025

[AMDAIESinkIntoCore] Generalize sinking for reuse with other regioned ops #1117

Merged

newling force-pushed the towards_vectorized_fill branch from 14e0dbb to 9c12e38 Compare February 20, 2025 15:47

newling marked this pull request as ready for review February 20, 2025 22:34

newling requested review from makslevental, MaheshRavishankar, nirvedhmeshram, yzhang93, Abhishek-Varma and jtuyls as code owners February 20, 2025 22:34

Abhishek-Varma requested changes Feb 21, 2025

View reviewed changes

newling force-pushed the towards_vectorized_fill branch from 691a16e to d9f51aa Compare February 21, 2025 16:14

newling requested a review from Abhishek-Varma February 21, 2025 16:16

Abhishek-Varma requested changes Feb 21, 2025

View reviewed changes

newling force-pushed the towards_vectorized_fill branch from a5680a7 to 8bc90eb Compare February 21, 2025 19:37

newling added 7 commits February 21, 2025 15:40

squash fill zero

76b20f3

addtestback

f090e51

update test

d64e164

cleanup

6652db6

Abhishek's review nod-ai#1

1b1524c

further comments addressed

dd05748

fix typo

f696e6c

newling force-pushed the towards_vectorized_fill branch from 8bc90eb to f696e6c Compare February 21, 2025 23:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorization of linalg.fill #1095

Vectorization of linalg.fill #1095

newling commented Feb 11, 2025 •

edited

Loading

Abhishek-Varma left a comment

Abhishek-Varma Feb 21, 2025

newling Feb 21, 2025

Abhishek-Varma Feb 21, 2025

newling Feb 21, 2025

Abhishek-Varma Feb 21, 2025

newling Feb 21, 2025

newling Feb 21, 2025

Vectorization of linalg.fill #1095

Are you sure you want to change the base?

Vectorization of linalg.fill #1095

Conversation

newling commented Feb 11, 2025 • edited Loading

Abhishek-Varma left a comment

Choose a reason for hiding this comment

Abhishek-Varma Feb 21, 2025

Choose a reason for hiding this comment

newling Feb 21, 2025

Choose a reason for hiding this comment

Abhishek-Varma Feb 21, 2025

Choose a reason for hiding this comment

newling Feb 21, 2025

Choose a reason for hiding this comment

Abhishek-Varma Feb 21, 2025

Choose a reason for hiding this comment

newling Feb 21, 2025

Choose a reason for hiding this comment

newling Feb 21, 2025

Choose a reason for hiding this comment

newling commented Feb 11, 2025 •

edited

Loading