Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TKW] Add missing validation and error messages #432

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

GMNGeoffrey
Copy link
Contributor

@GMNGeoffrey GMNGeoffrey commented Jan 29, 2025

This is a collection of small changes adding additional validation or more verbose error messages to TKW.

Includes:

  • More verbose (or any) error messages for existing asserts and exceptions. Maybe this is too much, but the current messages frequently don't tell you what's going wrong. I think it's better to err on the side of too much information. If these get annoying for some reason we can trim them.
  • Transforming some builtin KeyError into more specific error messages with context
  • Making the validation if get_custom is passed an op an error instead. I think this more likely than not points to a bug, so it's better for it to be an error. I can't remember the specific case in which I hit it.
  • Adding earlier validation if there's a type mismatch between a reduce op init and return types.
  • Adding earlier validation if there's an issue when decomposing reduce ops and the local reduction doesn't match the accumulator reduction.
  • Reporting the argument that has an issue if there's a failure in decomposing reduce ops.
  • Printing which node had an issue if there's a failure during codegen
  • Validating that reduction and generated for loop have the same number of arguments. This otherwise results in a failure later on, but we can give more useful information here. I reported [TKW] Bug: reduction expansion loses return values #384 for the bug that causes this to fire.
  • Validating MMA shapes. m has to be in lhs and n in rhs. Locally, I actually have much more restrictive validation that lhs had to be [..., m, k] and rhs [..., n, k]. In theory it looks like Wave is supposed to figure things out if that isn't the setup, but I never had a case where it actually worked, so it seems like you need walk some narrow path. This version is the less restrictive one though.
  • Reporting more information if IREE invocation fails.
  • Removing assumption that a reduction has users in get_users
  • Removing some unused variables and arguments
  • Adding a __str__ method for IndexingContext and __repr__ methods to ExpansionInfo and ReductionInfo. Maybe these should be data classes?
  • Adding some missing types to some functions

One note is that I'm not really sure what the convention is for Exception types in the project, so a lot of these are just RuntimeError. That's not awesome, but I think it's still a lot more helpful than nothing. I tried to avoid raise ... from as in my experience these usually result in unhelpfully verbose stacks.

This is a bad PR in the sense that it does a bunch of different things.
This is a collection of changes I made, mostly around improving
debugging with more verbose and specific error messages. But there's
also some other stuff in here. Most of these don't really seem
sufficiently important to break out into their own PR, but it seemed a
shame to just throw away things that seemed like improvements. I did
try to clean things up a bit to make them actually upstreamable.

One note is that I'm not really sure what the convention is for
Exception types in the project, so a lot of these are just RuntimeError.
That's not awesome, but I think it's still a lot more helpful than
nothing.
@GMNGeoffrey
Copy link
Contributor Author

Sorry, should've run tests after doing cleanup, obviously....

@GMNGeoffrey GMNGeoffrey marked this pull request as draft January 29, 2025 00:47
@raikonenfnu
Copy link
Contributor

I know it can get quite long, but would be lovely if we have the list of fixes on the PR message

There's still something suspicious going on with the index for reduction
ops, but my change causes breakage, so reverting it.
This requires changing too many tests to be part of this PR. Will try
to submit separately.
I'm not sure that this is correct. I hit some weird thing here, but this
may not be the fix.
@GMNGeoffrey
Copy link
Contributor Author

GMNGeoffrey commented Jan 29, 2025

I removed some of the more substantive changes, fixed the test failures, and wrote down more detail about the changes.

@GMNGeoffrey GMNGeoffrey marked this pull request as ready for review January 29, 2025 23:00
I added these while debugging the test failures from this PR
@GMNGeoffrey
Copy link
Contributor Author

Ping on this?

@harsh-nod
Copy link
Contributor

Can you break this into individual commits and we can start landing those

@GMNGeoffrey
Copy link
Contributor Author

Sure, it was mostly one PR so that I could send it all before I went on vacation. I'll look into breaking it up a bit

Make the callable passed to `CapturedTrace.walk` optional. If omitted,
`walk` just returns all the nodes.
@GMNGeoffrey
Copy link
Contributor Author

I've extracted a few PRs. I think what will remain here actually has a coherent single purpose of adding missing validation and error messages.

@GMNGeoffrey GMNGeoffrey changed the title Collection of QoL changes [TKW] Add missing validation and error messages Feb 28, 2025
@GMNGeoffrey
Copy link
Contributor Author

I think this is now sufficiently narrow to be a reasonable PR. It still has a few extraneous things like adding some missing return types to some functions, but that doesn't seem like something that's worth breaking off into its own PR and it also seems a bit silly to just delete them.

@GMNGeoffrey
Copy link
Contributor Author

The big diff in iree/turbine/kernel/wave/codegen/emitter.py is due to windows newlines, which I'm removing in #544

@GMNGeoffrey
Copy link
Contributor Author

Whoops, I accidentally merged #548 into all my PRs. Will revert once the CI runs finish

@GMNGeoffrey GMNGeoffrey enabled auto-merge (squash) March 6, 2025 01:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants