Skip to content

Add support for loss parallel #1546

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 10, 2025
Merged

Conversation

fmassa
Copy link
Member

@fmassa fmassa commented Aug 8, 2025

IMO we should just add the loss in the model and let autoparallel parallelize it for us. But for now, let's follow how the other models are implemented

IMO we should just add the loss in the model and let autoparallel parallelize it for us
@fmassa fmassa requested a review from wconstab August 8, 2025 09:44
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 8, 2025
# in our case, but we can work around it by adding
# casting the output to a DTensor on a 1d device mesh.
# We should just use AutoParallel to do this for us, but
# it would require putting the loss inside the model as well
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that overall we should just put the loss in the model, but I like the approach here for now because it's useful to be as structurally similar to torchtitan as possible for drop-in purposes

@fmassa fmassa merged commit 3f04d22 into autoparallel Aug 10, 2025
2 checks passed
@fmassa fmassa deleted the fmassa/enable_loss_parallel branch August 10, 2025 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants