Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace div op with combination of recip and mul #842

Closed
wants to merge 1 commit into from

Conversation

mrakitaTT
Copy link
Contributor

Issue: #841

@nvukobratTT
Copy link
Contributor

Instead of pushing this workaround in the core compiler, can we push priority on the TTNN side? I'm scared that leaving these as P1 will not get fixed soon, and we'll end up using this workaround in the long run.

@mrakitaTT
Copy link
Contributor Author

Instead of pushing this workaround in the core compiler, can we push priority on the TTNN side? I'm scared that leaving these as P1 will not get fixed soon, and we'll end up using this workaround in the long run.

@nvukobratTT Can't we do both? :) IMO when the workaround fix is easy to revert it is better to have it in and pursue the proper fix on the side, especially when it is blocking some demos.

@sdjordjevicTT
Copy link
Contributor

sdjordjevicTT commented Sep 28, 2024

@mrakitaTT before workarounding, can we first clarify with TTNN folks the issue and what is needed from their side to support proper solution? After we get feedback from them, we can easily decide whether and how we want to workaround it if necessary.

@mrakitaTT
Copy link
Contributor Author

@sdjordjevicTT The issue is pretty clear and from the conversation under the issue it seems that they are working on it but they can't agree on the proper fix, so I guess it is just a matter of time for them to reach agreement. I can ping them to see if they are close to resolution, though I was not that hopeful after reading full conversation (which I recommend you to do too if you haven't, some wild stuff there).

In the meantime, our compiler is producing graph that is failing in runtime in this case. I don't understand why is it the problem in general to have a workaround on our side until they fix their issue? Also the workaround I am using is the one that they suggested in the issue.

I would understand if I was introducing some big code change which would be hard to revert later or which affects other parts of the system, but this is literally plug in/plug out change maximally scoped out. I changed to different ConversionPattern and later we just reverse to the old ConversionPattern. Is there something that I am missing, is this going to affect optimization passes or something else?

Anyways, I don't really mind if you don't want to have this fix in main, only pain for us currently is that we have to keep it on all our side branches where we test MNIST model, so to ease that pain I wanted to push it into main. I'd just like to understand what is all the fuss about so I can do things different in the future because I think we are going to have many more situations like this 🙂 We are currently dealing with couple of metal issues and judging by their 1.6k open issues it might take a while.

@nsmithtt
Copy link
Contributor

This is an old PR, perhaps we can close?

@nsmithtt
Copy link
Contributor

nsmithtt commented Mar 2, 2025

Issue is over 5 months old closing, please reopen if still an issue!

@nsmithtt nsmithtt closed this Mar 2, 2025
@mrakitaTT
Copy link
Contributor Author

@nsmithtt if you can believe it, issue still exists :) But new PR should be opened anyways, since we now have a workarounds pass where this should be implemented. I've pinged Metal team in December about this, they moved it to "Done" in January and we actually didn't see an issue anymore, but @vladimirjovanovicTT's team now encountered it in softmax backwards. They will take over adding a workaround.

@vladimirjovanovicTT
Copy link
Contributor

I opened a new issue regarding div op here: #2350

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants