[LoweringStrategy] Refactor to take num of rows/cols as inputs #955

yzhang93 · 2024-12-03T22:43:10Z

This PR added support to take the number of cores (using flag --iree-amdaie-num-rows --iree-amdaie-num-cols) as input and generate L2 tile sizes accordingly. This PR also addressed some remaining issues when switching to 4x4 array usage.

In AIR pipeline, for matmul ops switch to use 4x4 array by default. However, the matmul-elementwise tests don't work with 4x4 array, so added flags to use 2x2 array in ci. CC @erwei-xilinx
Also change option name usePassPipeline to useTilePipeline.

Abhishek-Varma · 2024-12-04T07:38:23Z

compiler/plugins/target/AMD-AIE/iree-amd-aie/Target/AIETarget.cpp

+        maxCoreCols = 8;
+        break;
+      default:
+        llvm::errs() << "unhandled NPU partitioning.\n";


Isn't this supposed to bail out or exit at this point ?
I'm not sure what llvm::errs() does besides printing the message. In case it acts as an "assert" and behaves in the "bail out" mechanism then this is okay.

Yes I'd put an assert(false && "unhandled target device, we must specify the array size here")

But isn't this info available already from somewhere like

iree-amd-aie/runtime/src/iree-amd-aie/aie_runtime/iree_aie_runtime.h

Line 284 in 7c4b985

int rows() const;

?

It's not just printing the message, it will bail out like assert. And there are llvm::errs() used through out this file.

@newling @jtuyls Yes, we can use deviceModel.configPtr.NumCols to get num_rows/cols, but it still need some hardcode adjustments to show number of rows/cols for cores. For example, npu1_4col returns 5 cols and 6 rows, so we need to -1 cols and -2 rows for cores. While npu4 returns 8 cols and 6 rows, so we only need to -2 rows for cores.

You can create a new function in the device model getNumCoreRows that returns the number of rows with cores and for the 5 columns, we probably just need to change that in device model.

Okay, I figured out we can use deviceModel.columns() to get correct number of cols from the device. It returns 4 cols for npu1_4col, and 5 cols for npu1. But I think it's okay, we'd never use 5 cols for npu1 device.

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/KernelDispatch.cpp

Abhishek-Varma · 2024-12-04T08:00:45Z

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/test/lowering_strategy.mlir

-// CHECK-PACK-PEEL{LITERAL}: #packingConfig = #amdaie.packing_config<packing_config = [{packedSizes = [44, 64, 64], transposePackIndices = [1], unpackEmpty = [false], innerPerm = [[1, 0]], outerPerm = [[0, 1]]}, {packedSizes = [0, 0, 0, 4, 4, 8], transposePackIndices = [0, 1, 2], unpackEmpty = [false, false, true], innerPerm = [[0, 1], [1, 0], [0, 1]], outerPerm = [[0, 1, 3, 2], [0, 1, 3, 2], [0, 1, 3, 2]]}]>
+// CHECK-PACK-PEEL{LITERAL}: #packingConfig = #amdaie.packing_config<packing_config = [{packedSizes = [44, 32, 64], transposePackIndices = [1], unpackEmpty = [false], innerPerm = [[1, 0]], outerPerm = [[0, 1]]}, {packedSizes = [0, 0, 0, 4, 4, 8], transposePackIndices = [0, 1, 2], unpackEmpty = [false, false, true], innerPerm = [[0, 1], [1, 0], [0, 1]], outerPerm = [[0, 1, 3, 2], [0, 1, 3, 2], [0, 1, 3, 2]]}]>


Why did it change from 64 -> 32 ?

Wouldn't this warrant inclusion of the row/col flags ?

No, it's because now we change 2x2 core usage to 4x4 by default for AIR path. If you use flag --iree-amdaie-num-rows=2 --iree-amdaie-num-cols=2 you will see the previous sizes.

newling · 2024-12-04T15:38:31Z

build_tools/ci/cpu_comparison/run.py

+        self.add_aie_compilation_flags(
+            [
+                "--iree-amdaie-matmul-elementwise-fusion",
+                "--iree-amdaie-num-rows=2",


Doesn't this change the default for all matmuls from 4x4 to 2x2? Maybe not what we want?

I'd be interested to know how many of the tests

for n_rows in [1,4]: for n_cols in [1,2,3,4]: # run matmul with these values (on npu1_4col for say M=N=K=3*256).

work

It's being shown in a confusing way here, but these changes are actually in MatmulThinBias, not Matmul.

Yes, it's added to MatmulThinBias class and used for matmul-elementwise tests in AIR.

Ha, I totally missed that...

newling · 2024-12-04T15:41:58Z

Can the flags --xrt_lite_n_core_rows and --xrt_lite_n_core_cols be reused, instead of adding new flags --iree-amdaie-num-rows and --iree-amdaie-num-cols ? When would I use different values for these?

jtuyls · 2024-12-04T15:55:41Z

Can the flags --xrt_lite_n_core_rows and --xrt_lite_n_core_cols be reused, instead of adding new flags --iree-amdaie-num-rows and --iree-amdaie-num-cols ? When would I use different values for these?

--xrt_lite_n_core_rows and --xrt_lite_n_core_cols are iree-run-module flags. I don't think they can be reused in iree-compile.

newling · 2024-12-04T17:15:04Z

Can the flags --xrt_lite_n_core_rows and --xrt_lite_n_core_cols be reused, instead of adding new flags --iree-amdaie-num-rows and --iree-amdaie-num-cols ? When would I use different values for these?

--xrt_lite_n_core_rows and --xrt_lite_n_core_cols are iree-run-module flags. I don't think they can be reused in iree-compile.

That makes sense. Maybe (unrelated to this PR) the xrt_lite_n_core_rows and xrt_lite_n_core_cols can be inferred from the device? i.e. always just take the maximum, so for npu1_4col, infer that xrt_lite_n_core_row=xrt_lite_n_core_col=4` ? There was probably a reason Maks didn't do it this way though

newling

Looks good, thanks for this.

I tried this locally with run.py with t M=N=512 K=4096 with different numbers of colums and rows, and I see

--iree-amdaie-num-rows=4", "--iree-amdaie-num-cols=1"
error: 'amdaie.logicalobjectfifo.from_memref' op should have at least one tile candidate

--iree-amdaie-num-rows=4", "--iree-amdaie-num-cols=2"
<unknown>:0: error: 'aie.memtile_dma' op could not find and assign a valid BD id

--iree-amdaie-num-rows=4", "--iree-amdaie-num-cols=3"
error: 'amdaie.logicalobjectfifo.from_memref' op should have at least one tile candidate

--iree-amdaie-num-rows=1", "--iree-amdaie-num-cols=2"
error: 'amdaie.logicalobjectfifo.from_memref' op should have at least one tile candidate

@yzhang93 have you got any example with n_rows != n_cols to run end to end? I'm just curious, I'm happy for progress to made on this as a follow-up.

newling · 2024-12-04T20:04:38Z

runtime/src/iree-amd-aie/aie_runtime/iree_aie_runtime.h

@@ -368,6 +368,9 @@ struct AMDAIEDeviceModel {
    return deviceConfig.maxVectorSizeBits;
  }

+  uint32_t getNumCoreRows() const { return rows() - 2; }


iree-amd-aie/compiler/plugins/target/AMD-AIE/iree-amd-aie/IR/AMDAIEOps.cpp

Line 113 in cbb17c9

/// Hardcoded row_offset == 2 -> AIE core rows start from 2

I'm wondering if there will might ever be more than 1 row of L2 memory tiles, or some other reason this might not work in the future

It should just be configPtr.AieTileNumRows

newling · 2024-12-04T20:17:45Z

compiler/plugins/target/AMD-AIE/iree-amd-aie/Target/AIETarget.cpp

+    uint32_t maxCoreRows = deviceModel.getNumCoreRows();
+    uint32_t maxCoreCols = deviceModel.getNumCoreCols();
+    if (options.AMDAIENumRows <= 0 || options.AMDAIENumRows > maxCoreRows) {
+      llvm::report_fatal_error("option numRows is out of range\n");


Suggested change

llvm::report_fatal_error("option numRows is out of range\n");

llvm::report_fatal_error(llvm::Twine("Invalid number of core rows (") +

std::to_string(options.AMDAIENumRows) +

"), must be in the range [1, " +

std::to_string(maxCoreRows) + "] for device " +

stringifyEnum(deviceModel.device));

Might save you in the future! A lit test of this would be nice if possible.

Thanks for the suggestion! I've adopted it and manually tested the error messages. However, I don't think I can add a lit test because the program crashed with the error. expected-error doesn't work in such situations.

newling · 2024-12-04T20:23:49Z

compiler/plugins/target/AMD-AIE/iree-amd-aie/Target/AIETarget.h

@@ -59,6 +59,8 @@ struct AMDAIEOptions {
  bool insertLoopAroundCoreBlock{false};
  bool matmulElementwiseFusion{false};
  AMDAIEDevice AMDAIETargetDevice{AMDAIEDevice::npu1_4col};
+  int AMDAIENumRows{4};


These default values can be inferred from the default AMDAIETargetDevice: create device model and then call getNumCoreRows() and getNumCoreCols()

My thinking is that the fewer defaults to twiddle when we one day change to npu4 as the default, the better.

newling · 2024-12-04T20:29:33Z

compiler/plugins/target/AMD-AIE/iree-amd-aie/Target/AIETarget.h

@@ -231,6 +233,14 @@ struct AMDAIEOptions {
            clEnumValN(AMDAIEDevice::npu4, "npu4",
                       "Strix B0 NPU with 8 columns and 6 rows")));

+    binder.opt<int>("iree-amdaie-num-rows", AMDAIENumRows,
+                    llvm::cl::cat(category),
+                    llvm::cl::desc("Number of rows used in an AIE core array"));


Suggested change

llvm::cl::desc("Number of rows used in an AIE core array"));

llvm::cl::desc("Number of rows used in an AIE core array. The compiler will choose a tiling strategy that uses no more than this number of rows. However, some workloads (like convolution) currently ignore this flag, and use a hardcoded number of rows."));

or something like that, for users without the context

yzhang93 · 2024-12-04T21:42:54Z

Looks good, thanks for this.

I tried this locally with run.py with t M=N=512 K=4096 with different numbers of colums and rows, and I see
--iree-amdaie-num-rows=4", "--iree-amdaie-num-cols=1"
error: 'amdaie.logicalobjectfifo.from_memref' op should have at least one tile candidate

--iree-amdaie-num-rows=4", "--iree-amdaie-num-cols=2"
<unknown>:0: error: 'aie.memtile_dma' op could not find and assign a valid BD id

--iree-amdaie-num-rows=4", "--iree-amdaie-num-cols=3"
error: 'amdaie.logicalobjectfifo.from_memref' op should have at least one tile candidate

--iree-amdaie-num-rows=1", "--iree-amdaie-num-cols=2"
error: 'amdaie.logicalobjectfifo.from_memref' op should have at least one tile candidate
@yzhang93 have you got any example with n_rows != n_cols to run end to end? I'm just curious, I'm happy for progress to made on this as a follow-up.

Thanks for trying different combinations! I think we had some e2e test running on 4x2 previously in a dev branch, but now as I tried and these failed with the same errors above. @jtuyls It looks like we'll need to generalize AssignTiles pass.

Abhishek-Varma

Thanks @yzhang93 !

LGTM % one comment.

Abhishek-Varma · 2024-12-05T06:31:25Z

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/Passes.h

+    OpPassManager &variantPassManager, AMDAIEDevice device, int numRows,
+    int numCols, TilePassPipeline useTilePipeline,


Suggested change

OpPassManager &variantPassManager, AMDAIEDevice device, int numRows,

int numCols, TilePassPipeline useTilePipeline,

OpPassManager &variantPassManager, AMDAIEDevice device, uint32_t numRows,

uint32_t numCols, TilePassPipeline useTilePipeline,

newling

LGTM

yzhang93 requested review from MaheshRavishankar, nirvedhmeshram, Abhishek-Varma and jtuyls as code owners December 3, 2024 22:43

Abhishek-Varma requested changes Dec 4, 2024

View reviewed changes

Abhishek-Varma reviewed Dec 4, 2024

View reviewed changes

newling reviewed Dec 4, 2024

View reviewed changes

yzhang93 force-pushed the refactor_tiling branch from 48a2496 to 0cf2348 Compare December 4, 2024 19:25

yzhang93 requested a review from makslevental as a code owner December 4, 2024 19:25

newling reviewed Dec 4, 2024

View reviewed changes

yzhang93 added 3 commits December 4, 2024 16:30

[LoweringStrategy] Refactor to take num of rows/cols as inputs

08970d0

Fix flag name

38a0472

Address comments

2cc4e2b

yzhang93 force-pushed the refactor_tiling branch from 0cf2348 to 055db81 Compare December 5, 2024 00:31

Additional comments

ec3f57a

yzhang93 force-pushed the refactor_tiling branch from 055db81 to ec3f57a Compare December 5, 2024 00:41

Abhishek-Varma approved these changes Dec 5, 2024

View reviewed changes

Fix type

fdc62ed

yzhang93 enabled auto-merge (squash) December 5, 2024 06:52

yzhang93 merged commit 63bc340 into nod-ai:main Dec 5, 2024
7 checks passed

newling reviewed Dec 5, 2024

View reviewed changes

newling mentioned this pull request Dec 11, 2024

[AMDAIESplitLogicalObjFifos] Handle case where the AIE array used in m x n where m=1 or n=1 #984

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LoweringStrategy] Refactor to take num of rows/cols as inputs #955

[LoweringStrategy] Refactor to take num of rows/cols as inputs #955

yzhang93 commented Dec 3, 2024 •

edited

Loading

Abhishek-Varma Dec 4, 2024

newling Dec 4, 2024

yzhang93 Dec 4, 2024

jtuyls Dec 4, 2024

yzhang93 Dec 4, 2024 •

edited

Loading

Abhishek-Varma Dec 4, 2024

yzhang93 Dec 4, 2024

newling Dec 4, 2024

jtuyls Dec 4, 2024 •

edited

Loading

yzhang93 Dec 4, 2024

newling Dec 4, 2024

newling commented Dec 4, 2024

jtuyls commented Dec 4, 2024

newling commented Dec 4, 2024

newling left a comment

newling Dec 4, 2024

jtuyls Dec 4, 2024

newling Dec 4, 2024

yzhang93 Dec 5, 2024 •

edited

Loading

newling Dec 4, 2024

newling Dec 4, 2024

yzhang93 commented Dec 4, 2024 •

edited

Loading

Abhishek-Varma left a comment

Abhishek-Varma Dec 5, 2024

newling left a comment

		// CHECK-PACK-PEEL{LITERAL}: #packingConfig = #amdaie.packing_config<packing_config = [{packedSizes = [44, 64, 64], transposePackIndices = [1], unpackEmpty = [false], innerPerm = [[1, 0]], outerPerm = [[0, 1]]}, {packedSizes = [0, 0, 0, 4, 4, 8], transposePackIndices = [0, 1, 2], unpackEmpty = [false, false, true], innerPerm = [[0, 1], [1, 0], [0, 1]], outerPerm = [[0, 1, 3, 2], [0, 1, 3, 2], [0, 1, 3, 2]]}]>
		// CHECK-PACK-PEEL{LITERAL}: #packingConfig = #amdaie.packing_config<packing_config = [{packedSizes = [44, 32, 64], transposePackIndices = [1], unpackEmpty = [false], innerPerm = [[1, 0]], outerPerm = [[0, 1]]}, {packedSizes = [0, 0, 0, 4, 4, 8], transposePackIndices = [0, 1, 2], unpackEmpty = [false, false, true], innerPerm = [[0, 1], [1, 0], [0, 1]], outerPerm = [[0, 1, 3, 2], [0, 1, 3, 2], [0, 1, 3, 2]]}]>

-      llvm::report_fatal_error("option numRows is out of range\n");
+            llvm::report_fatal_error(llvm::Twine("Invalid number of core rows (") +
+                               std::to_string(options.AMDAIENumRows) +
+                               "), must be in the range [1, " +
+                               std::to_string(maxCoreRows) + "] for device " +
+                               stringifyEnum(deviceModel.device));

	llvm::cl::desc("Number of rows used in an AIE core array"));
	llvm::cl::desc("Number of rows used in an AIE core array. The compiler will choose a tiling strategy that uses no more than this number of rows. However, some workloads (like convolution) currently ignore this flag, and use a hardcoded number of rows."));

		OpPassManager &variantPassManager, AMDAIEDevice device, int numRows,
		int numCols, TilePassPipeline useTilePipeline,

[LoweringStrategy] Refactor to take num of rows/cols as inputs #955

[LoweringStrategy] Refactor to take num of rows/cols as inputs #955

Conversation

yzhang93 commented Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yzhang93 Dec 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jtuyls Dec 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

newling commented Dec 4, 2024

jtuyls commented Dec 4, 2024

newling commented Dec 4, 2024

newling left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yzhang93 Dec 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yzhang93 commented Dec 4, 2024 • edited Loading

Abhishek-Varma left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

newling left a comment

Choose a reason for hiding this comment

yzhang93 commented Dec 3, 2024 •

edited

Loading

yzhang93 Dec 4, 2024 •

edited

Loading

jtuyls Dec 4, 2024 •

edited

Loading

yzhang93 Dec 5, 2024 •

edited

Loading

yzhang93 commented Dec 4, 2024 •

edited

Loading