Fix an issue in sync_amax #169

y-sq · 2023-12-21T19:32:58Z

To fix this error

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor []] is at version 1; expected version 0 instead.

Also tried

@torch.no_grad()
 def sync_float8_amax_and_scale_history(

which didn't work.

We can look into if there are any better ways to fix this.

Test Plan:
./test/test_fsdp.sh

facebook-github-bot · 2023-12-21T19:34:02Z

@y-sq has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2023-12-21T21:27:50Z

@y-sq merged this pull request in 31fba04.

Fix an issue in sync_amax

a688be7

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 21, 2023

y-sq marked this pull request as ready for review December 21, 2023 19:33

vkuzo approved these changes Dec 21, 2023

View reviewed changes

facebook-github-bot closed this in 31fba04 Dec 21, 2023

facebook-github-bot added the Merged label Dec 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix an issue in sync_amax #169

Fix an issue in sync_amax #169

Uh oh!

y-sq commented Dec 21, 2023 •

edited

Loading

Uh oh!

facebook-github-bot commented Dec 21, 2023

Uh oh!

facebook-github-bot commented Dec 21, 2023

Uh oh!

Uh oh!

Fix an issue in sync_amax #169

Fix an issue in sync_amax #169

Uh oh!

Conversation

y-sq commented Dec 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Dec 21, 2023

Uh oh!

facebook-github-bot commented Dec 21, 2023

Uh oh!

Uh oh!

y-sq commented Dec 21, 2023 •

edited

Loading