enable autocast + compile + FSDP + Float8Linear #172

vkuzo · 2023-12-31T19:41:00Z

Summary:

This adds a couple of config options to unbreak autocast + compile + FSDP + Float8Linear. To enable these options, the user needs to do:

config.enable_amax_init = False
config.enable_pre_and_post_forward = False

The enable_amax_init config adds the option to disable amax initialization. The reason this is currently broken is:

FSDP is not full-graph friendly (regardless of compile)
the amax init function has a graph break in distributed code because it uses inplace distributed collectives. I did try to use functional collectives ([wip] make Float8Linear amax init more FSDP+compile friendly #171), but that ran into numerical issues with compile, so for now just working around it.
graph breaks in Float8Linear code are not supported because of the issue documented in [wip] enable Float8Tensor as subgraph boundary #166
so, as a workaround for all of the above, we just skip amax init for now. We do know from NVIDIA that this path is not needed for model convergence, and TE does not support this at all. It was nice for testing but not necessary for training jobs.

The second config option disables pre-forward and post-forward. I don't have a repro in a unit test for now, but this does unbreak LLaMa 7B on 8 GPUs with FSDP + compile. Specifically, the thing which is broken in pre-forward/post-forward is assignment on module attributes. My hunch is that this graph breaks if autocast + FSDP are on, and graph breaks are not supported due to (3) above.

Test Plan:

// unit / integration tests
with-proxy test/test_everything.sh

// run the LLaMa 7b trainer on 8 GPUs with autocast + compile + FSDP + Float8Linear, no compile errors

Reviewers:

Subscribers:

Tasks:

Tags:

facebook-github-bot · 2023-12-31T19:42:54Z

@vkuzo has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

vkuzo · 2023-12-31T19:44:58Z

float8_experimental/float8_linear.py

@@ -335,7 +342,7 @@ def forward(self, x):
        y = self.cast_y_to_float8_in_bw(y, self.emulate)

        if self.bias is not None:
-            y = y + self.bias.to(self.bias_dtype)
+            y = y + self.bias.to(y.dtype)


this change is not called out in the configs, but this just removes the need to store a module attribute, which also makes this code more full-graph compile friendly. Modifying module attributes seems to graph break if autocast and FSDP are both enabled.

drisspg

Okay this makes sense

Summary: This adds a couple of config options to unbreak autocast + compile + FSDP + Float8Linear. To enable these options, the user needs to do: ``` config.enable_amax_init = False config.enable_pre_and_post_forward = False ``` The `enable_amax_init` config adds the option to disable amax initialization. The reason this is currently broken is: 1. FSDP is not full-graph friendly (regardless of compile) 2. the amax init function has a graph break in distributed code because it uses inplace distributed collectives. I did try to use functional collectives, but that ran into numerical issues with compile, so for now just working around it. 3. graph breaks in Float8Linear code are not supported because of the issue documented in #166 4. so, as a workaround for all of the above, we just skip amax init for now. We do know from NVIDIA that this path is not needed for model convergence, and TE does not support this at all. It was nice for testing but not necessary for training jobs. The second config option disables pre-forward and post-forward. I don't have a repro in a unit test for now, but this does unbreak LLaMa 7B on 8 GPUs with FSDP + compile. Specifically, the thing which is broken in pre-forward/post-forward is assignment on module attributes. My hunch is that this graph breaks if autocast + FSDP are on, and graph breaks are not supported due to (3) above. Test Plan: ``` // unit / integration tests with-proxy test/test_everything.sh // run the LLaMa 7b trainer on 8 GPUs with autocast + compile + FSDP + Float8Linear, no compile errors ``` Reviewers: Subscribers: Tasks: Tags:

facebook-github-bot · 2024-01-02T23:09:57Z

@vkuzo has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-01-03T00:32:21Z

@vkuzo merged this pull request in 120e752.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 31, 2023

vkuzo force-pushed the 20231229_fsdp_autocast_compile_test branch from 4fa2654 to e3ab9c9 Compare December 31, 2023 19:41

vkuzo commented Dec 31, 2023

View reviewed changes

drisspg approved these changes Jan 2, 2024

View reviewed changes

vkuzo force-pushed the 20231229_fsdp_autocast_compile_test branch from e3ab9c9 to 2d97dde Compare January 2, 2024 23:09

facebook-github-bot closed this in 120e752 Jan 3, 2024

facebook-github-bot added the Merged label Jan 3, 2024

y-sq mentioned this pull request Mar 15, 2024

Graph breaks in Float8Linear code #237

Closed

vkuzo mentioned this pull request Jul 30, 2024

Graph breaks in Float8Linear code pytorch/ao#563

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

enable autocast + compile + FSDP + Float8Linear #172

enable autocast + compile + FSDP + Float8Linear #172

Uh oh!

vkuzo commented Dec 31, 2023 •

edited

Loading

Uh oh!

facebook-github-bot commented Dec 31, 2023

Uh oh!

vkuzo Dec 31, 2023

Uh oh!

drisspg left a comment

Uh oh!

facebook-github-bot commented Jan 2, 2024

Uh oh!

facebook-github-bot commented Jan 3, 2024

Uh oh!

Uh oh!

enable autocast + compile + FSDP + Float8Linear #172

enable autocast + compile + FSDP + Float8Linear #172

Uh oh!

Conversation

vkuzo commented Dec 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Dec 31, 2023

Uh oh!

vkuzo Dec 31, 2023

Choose a reason for hiding this comment

Uh oh!

drisspg left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Jan 2, 2024

Uh oh!

facebook-github-bot commented Jan 3, 2024

Uh oh!

Uh oh!

vkuzo commented Dec 31, 2023 •

edited

Loading