-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] Print program memory usage (PM) along with kernel time #953
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Great to keep track of this as well!
Something isn't as expected, the printed ranges for elf sizes are too large, https://github.com/nod-ai/iree-amd-aie/actions/runs/12147431541/job/33875335756 32 KB and still working? |
Yeah, those sizes don't look correct. |
So here the assumption is the elf file size exactly equals to program memory usage? Not sure if it has any other contents which makes it larger than program memory. |
Yeah that was my assumption, and it's incorrect. Ultimately we want the number checked here, where the error is thrown we go OOM: https://github.com/Xilinx/aie-rt/blob/3fdf8d8d9a7a2abcd8df76155f950680607eac8a/driver/src/core/xaie_elfloader.c#L229 but I don't think there's a simple way to get that. The call stack down to there looks like // in the iree-amd-aie side: // on the aie_rt side: Those last 2 calls are slicing into the elf C object using pointers which I don't think we can reasonably access. That is on the side where the elf is used to create the CDO. Before that there's a step for creating the elf (see aie2xclbin function), but I don't see anything in there either which will give us the magic number. |
I see. So if this number doesn't accurately show the PM usage, I think it's not very useful to print it out. |
Yup. I'm going to do some experiments to see if it as a good indicator of PM though, and if it is then it might be useful. |
4010452
to
2641d30
Compare
Thanks, looks good to me. We could consider throwing errors to catch it when this changes, but it won't be a 100% guarantee. We could check for example that the value is in range ]0, 16KB] or even [1KB, 16KB] or so. |
Makes sense, I'll add a check for <= 16384 bytes. I realized that my comment in the description that we'll see this number even when we go OOM is wrong. For that, we'd need to analyze the elf in aie2xclbin. Follow-up work... |
For the tiny |
This will be useful to see how close we are to overflowing program memory (16384 B). In the case where we do overflow, we won't see how much PM we require. For that, we'll need to add a check in aie2xclbin -- future work.
UPDATE: the size if the .elf file isn't what we want, the elf file contains other stuff. I discovered that the program memory size is stored at byte 72 of the elf (see newling/aie-rt@d0f08bc for how I found this)