Skip to content

Conversation

@fmassa
Copy link
Member

@fmassa fmassa commented Aug 8, 2025

IMO we should just add the loss in the model and let autoparallel parallelize it for us. But for now, let's follow how the other models are implemented

IMO we should just add the loss in the model and let autoparallel parallelize it for us
@fmassa fmassa requested a review from wconstab August 8, 2025 09:44
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 8, 2025
# in our case, but we can work around it by adding
# casting the output to a DTensor on a 1d device mesh.
# We should just use AutoParallel to do this for us, but
# it would require putting the loss inside the model as well
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that overall we should just put the loss in the model, but I like the approach here for now because it's useful to be as structurally similar to torchtitan as possible for drop-in purposes

@fmassa fmassa merged commit 3f04d22 into autoparallel Aug 10, 2025
2 checks passed
@fmassa fmassa deleted the fmassa/enable_loss_parallel branch August 10, 2025 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants