Skip to content
This repository was archived by the owner on Aug 7, 2024. It is now read-only.

Fix an issue in sync_amax #169

Closed
wants to merge 1 commit into from
Closed

Fix an issue in sync_amax #169

wants to merge 1 commit into from

Conversation

y-sq
Copy link
Contributor

@y-sq y-sq commented Dec 21, 2023

To fix this error

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor []] is at version 1; expected version 0 instead.

Also tried

@torch.no_grad()
 def sync_float8_amax_and_scale_history(

which didn't work.


We can look into if there are any better ways to fix this.


Test Plan:
./test/test_fsdp.sh

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 21, 2023
@y-sq y-sq marked this pull request as ready for review December 21, 2023 19:33
@facebook-github-bot
Copy link
Contributor

@y-sq has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@y-sq merged this pull request in 31fba04.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants