Skip to content

Conversation

@fmassa
Copy link
Contributor

@fmassa fmassa commented Jul 1, 2025

Now that we are using DTensors to store the parameters, it's better to be safe and use the redistribution from DTensor. We can optimize this further once DTensor has order information

Now that we are using DTensors to store the parameters, it's better to be safe and use the redistribution from DTensor. We can optimize this further once DTensor has order information
@fmassa fmassa requested a review from wconstab July 1, 2025 13:52
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jul 1, 2025
Copy link
Contributor

@wconstab wconstab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, this PR makes a huge improvement in loss convergence, so we should land it.

(though maybe delete the lines of code instead of comment them?)

Though, I want to understand the reason this was incorrect, I didn't look into it carefully yet.

@wconstab wconstab merged commit 6b44ea6 into main Jul 1, 2025
4 checks passed
@wconstab wconstab deleted the fmassa/disable_collectives_optim branch July 1, 2025 19:59
bdhirsh pushed a commit that referenced this pull request Jul 1, 2025
Now that we are using DTensors to store the parameters, it's better to be safe and use the redistribution from DTensor. We can optimize this further once DTensor has order information
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants