-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not outline elementwise operations #954
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a very good find @newling ! Thank you!
Another supporting argument that adds to what you mentioned above: since we don't outline linalg.fill
(a producer fused to matmul - part of a single loop - PROLOGUE), there shouldn't be any need to outline an elementwise (a consumer fused to matmul - part of a single loop - EPILOGUE).
LGTM % one review comment.
// CHECK-NEXT: func.call @[[MATMUL_K6]](%[[A1]], %[[B1]], %[[C]]) | ||
// CHECK-NEXT: amdaie.end | ||
// CHECK: return | ||
func.func @matmul_example_2(%A0: memref<4x4xbf16>, %B0: memref<4x4xbf16>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for this example. If required, you can add one linalg.matmul
to an already existing test case above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this test, I think it's good to have a sequence of tests which slowly change what's being tested from the preceding test, makes diagnosing failures easier. I should use a more descriptive name though...
ae7aa5a
to
dce2ab1
Compare
What observations motivate this PR?
I compared the input.opt.ll files with and without coalescing. One difference I noticed was in the number of calls to the elementwise outlined function. Specifically, grepping in the output directory of the run.py script after 2 runs creating directories
no_coalescing
andwith_coalescing
:Above notice that there are no function calls to the elementwise func when coalescing is enabled. i.e. the calls must have been inlined. Note that for matmul the number of calls is the same:
Background fact: we go out of program memory quite badly without coalescing (we hit 30 kB, much more than the allowed 16 kB). With coalescing, we fit.
So it would seem that inlining the function calls dramatically reduces the memory required. I can't explain this.
With the above observation, this PR removes function outlining for the elementwise ops, so that they are effectively always inline. With this change, program fits in memory even without coalescing.