-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LoweringStrategy] Refactor to take num of rows/cols as inputs #955
Conversation
maxCoreCols = 8; | ||
break; | ||
default: | ||
llvm::errs() << "unhandled NPU partitioning.\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this supposed to bail out or exit at this point ?
I'm not sure what llvm::errs()
does besides printing the message. In case it acts as an "assert" and behaves in the "bail out" mechanism then this is okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I'd put an assert(false && "unhandled target device, we must specify the array size here")
But isn't this info available already from somewhere like
int rows() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not just printing the message, it will bail out like assert
. And there are llvm::errs()
used through out this file.
@newling @jtuyls Yes, we can use deviceModel.configPtr.NumCols
to get num_rows/cols, but it still need some hardcode adjustments to show number of rows/cols for cores. For example, npu1_4col
returns 5 cols and 6 rows, so we need to -1 cols and -2 rows for cores. While npu4
returns 8 cols and 6 rows, so we only need to -2 rows for cores.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can create a new function in the device model getNumCoreRows
that returns the number of rows with cores and for the 5 columns, we probably just need to change that in device model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I figured out we can use deviceModel.columns()
to get correct number of cols from the device. It returns 4 cols for npu1_4col
, and 5 cols for npu1
. But I think it's okay, we'd never use 5 cols for npu1 device.
compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/KernelDispatch.cpp
Outdated
Show resolved
Hide resolved
compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/KernelDispatch.cpp
Outdated
Show resolved
Hide resolved
// CHECK-PACK-PEEL{LITERAL}: #packingConfig = #amdaie.packing_config<packing_config = [{packedSizes = [44, 64, 64], transposePackIndices = [1], unpackEmpty = [false], innerPerm = [[1, 0]], outerPerm = [[0, 1]]}, {packedSizes = [0, 0, 0, 4, 4, 8], transposePackIndices = [0, 1, 2], unpackEmpty = [false, false, true], innerPerm = [[0, 1], [1, 0], [0, 1]], outerPerm = [[0, 1, 3, 2], [0, 1, 3, 2], [0, 1, 3, 2]]}]> | ||
// CHECK-PACK-PEEL{LITERAL}: #packingConfig = #amdaie.packing_config<packing_config = [{packedSizes = [44, 32, 64], transposePackIndices = [1], unpackEmpty = [false], innerPerm = [[1, 0]], outerPerm = [[0, 1]]}, {packedSizes = [0, 0, 0, 4, 4, 8], transposePackIndices = [0, 1, 2], unpackEmpty = [false, false, true], innerPerm = [[0, 1], [1, 0], [0, 1]], outerPerm = [[0, 1, 3, 2], [0, 1, 3, 2], [0, 1, 3, 2]]}]> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did it change from 64
-> 32
?
Wouldn't this warrant inclusion of the row/col flags ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's because now we change 2x2 core usage to 4x4 by default for AIR path. If you use flag --iree-amdaie-num-rows=2 --iree-amdaie-num-cols=2
you will see the previous sizes.
self.add_aie_compilation_flags( | ||
[ | ||
"--iree-amdaie-matmul-elementwise-fusion", | ||
"--iree-amdaie-num-rows=2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this change the default for all matmuls from 4x4 to 2x2? Maybe not what we want?
I'd be interested to know how many of the tests
for n_rows in [1,4]:
for n_cols in [1,2,3,4]:
# run matmul with these values (on npu1_4col for say M=N=K=3*256).
work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's being shown in a confusing way here, but these changes are actually in MatmulThinBias
, not Matmul
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's added to MatmulThinBias
class and used for matmul-elementwise tests in AIR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the flags |
|
That makes sense. Maybe (unrelated to this PR) the |
48a2496
to
0cf2348
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for this.
I tried this locally with run.py with t M=N=512 K=4096 with different numbers of colums and rows, and I see
--iree-amdaie-num-rows=4", "--iree-amdaie-num-cols=1"
error: 'amdaie.logicalobjectfifo.from_memref' op should have at least one tile candidate
--iree-amdaie-num-rows=4", "--iree-amdaie-num-cols=2"
<unknown>:0: error: 'aie.memtile_dma' op could not find and assign a valid BD id
--iree-amdaie-num-rows=4", "--iree-amdaie-num-cols=3"
error: 'amdaie.logicalobjectfifo.from_memref' op should have at least one tile candidate
--iree-amdaie-num-rows=1", "--iree-amdaie-num-cols=2"
error: 'amdaie.logicalobjectfifo.from_memref' op should have at least one tile candidate
@yzhang93 have you got any example with n_rows != n_cols to run end to end? I'm just curious, I'm happy for progress to made on this as a follow-up.
@@ -368,6 +368,9 @@ struct AMDAIEDeviceModel { | |||
return deviceConfig.maxVectorSizeBits; | |||
} | |||
|
|||
uint32_t getNumCoreRows() const { return rows() - 2; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Hardcoded row_offset == 2 -> AIE core rows start from 2 |
I'm wondering if there will might ever be more than 1 row of L2 memory tiles, or some other reason this might not work in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should just be configPtr.AieTileNumRows
uint32_t maxCoreRows = deviceModel.getNumCoreRows(); | ||
uint32_t maxCoreCols = deviceModel.getNumCoreCols(); | ||
if (options.AMDAIENumRows <= 0 || options.AMDAIENumRows > maxCoreRows) { | ||
llvm::report_fatal_error("option numRows is out of range\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
llvm::report_fatal_error("option numRows is out of range\n"); | |
llvm::report_fatal_error(llvm::Twine("Invalid number of core rows (") + | |
std::to_string(options.AMDAIENumRows) + | |
"), must be in the range [1, " + | |
std::to_string(maxCoreRows) + "] for device " + | |
stringifyEnum(deviceModel.device)); |
Might save you in the future! A lit test of this would be nice if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion! I've adopted it and manually tested the error messages. However, I don't think I can add a lit test because the program crashed with the error. expected-error
doesn't work in such situations.
@@ -59,6 +59,8 @@ struct AMDAIEOptions { | |||
bool insertLoopAroundCoreBlock{false}; | |||
bool matmulElementwiseFusion{false}; | |||
AMDAIEDevice AMDAIETargetDevice{AMDAIEDevice::npu1_4col}; | |||
int AMDAIENumRows{4}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These default values can be inferred from the default AMDAIETargetDevice: create device model and then call getNumCoreRows()
and getNumCoreCols()
My thinking is that the fewer defaults to twiddle when we one day change to npu4 as the default, the better.
@@ -231,6 +233,14 @@ struct AMDAIEOptions { | |||
clEnumValN(AMDAIEDevice::npu4, "npu4", | |||
"Strix B0 NPU with 8 columns and 6 rows"))); | |||
|
|||
binder.opt<int>("iree-amdaie-num-rows", AMDAIENumRows, | |||
llvm::cl::cat(category), | |||
llvm::cl::desc("Number of rows used in an AIE core array")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
llvm::cl::desc("Number of rows used in an AIE core array")); | |
llvm::cl::desc("Number of rows used in an AIE core array. The compiler will choose a tiling strategy that uses no more than this number of rows. However, some workloads (like convolution) currently ignore this flag, and use a hardcoded number of rows.")); |
or something like that, for users without the context
Thanks for trying different combinations! I think we had some e2e test running on 4x2 previously in a dev branch, but now as I tried and these failed with the same errors above. @jtuyls It looks like we'll need to generalize |
0cf2348
to
055db81
Compare
055db81
to
ec3f57a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @yzhang93 !
LGTM % one comment.
OpPassManager &variantPassManager, AMDAIEDevice device, int numRows, | ||
int numCols, TilePassPipeline useTilePipeline, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpPassManager &variantPassManager, AMDAIEDevice device, int numRows, | |
int numCols, TilePassPipeline useTilePipeline, | |
OpPassManager &variantPassManager, AMDAIEDevice device, uint32_t numRows, | |
uint32_t numCols, TilePassPipeline useTilePipeline, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR added support to take the number of cores (using flag
--iree-amdaie-num-rows --iree-amdaie-num-cols
) as input and generate L2 tile sizes accordingly. This PR also addressed some remaining issues when switching to 4x4 array usage.usePassPipeline
touseTilePipeline
.