Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not outline elementwise operations #954

Merged
merged 4 commits into from
Dec 4, 2024

Conversation

newling
Copy link
Contributor

@newling newling commented Dec 3, 2024

What observations motivate this PR?

I compared the input.opt.ll files with and without coalescing. One difference I noticed was in the number of calls to the elementwise outlined function. Specifically, grepping in the output directory of the run.py script after 2 runs creating directories no_coalescing and with_coalescing:

grep "call void @generic_elementwise" no_coalescing/matmul_truncf_128_256_bf16_f32/input.opt.ll | wc -l
16
grep "call void @generic_elementwise" with_coalescing/matmul_truncf_128_256_bf16_f32/input.opt.ll | wc -l
0

Above notice that there are no function calls to the elementwise func when coalescing is enabled. i.e. the calls must have been inlined. Note that for matmul the number of calls is the same:

grep "call void @generic_matmul" no_coalescing/matmul_truncf_128_256_bf16_f32/input.opt.ll | wc -l
128
grep "call void @generic_matmul" with_coalescing/matmul_truncf_128_256_bf16_f32/input.opt.ll | wc -l
128

Background fact: we go out of program memory quite badly without coalescing (we hit 30 kB, much more than the allowed 16 kB). With coalescing, we fit.

So it would seem that inlining the function calls dramatically reduces the memory required. I can't explain this.

With the above observation, this PR removes function outlining for the elementwise ops, so that they are effectively always inline. With this change, program fits in memory even without coalescing.

@newling newling marked this pull request as ready for review December 3, 2024 23:42
@newling newling removed the request for review from nirvedhmeshram December 3, 2024 23:42
Copy link
Contributor

@Abhishek-Varma Abhishek-Varma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a very good find @newling ! Thank you!

Another supporting argument that adds to what you mentioned above: since we don't outline linalg.fill (a producer fused to matmul - part of a single loop - PROLOGUE), there shouldn't be any need to outline an elementwise (a consumer fused to matmul - part of a single loop - EPILOGUE).

LGTM % one review comment.

// CHECK-NEXT: func.call @[[MATMUL_K6]](%[[A1]], %[[B1]], %[[C]])
// CHECK-NEXT: amdaie.end
// CHECK: return
func.func @matmul_example_2(%A0: memref<4x4xbf16>, %B0: memref<4x4xbf16>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this example. If required, you can add one linalg.matmul to an already existing test case above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this test, I think it's good to have a sequence of tests which slowly change what's being tested from the preceding test, makes diagnosing failures easier. I should use a more descriptive name though...

@newling newling force-pushed the do_not_outline_elmwise branch from ae7aa5a to dce2ab1 Compare December 4, 2024 16:52
@newling newling enabled auto-merge (squash) December 4, 2024 17:16
@newling newling changed the title Do not outline elementwise operation Do not outline elementwise operations Dec 4, 2024
@newling newling merged commit cbb17c9 into nod-ai:main Dec 4, 2024
7 checks passed
@newling newling deleted the do_not_outline_elmwise branch December 12, 2024 23:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants