[wip] enable Float8Tensor as subgraph boundary #166

vkuzo · 2023-12-19T16:11:00Z

Summary:

Explorations to see how we can enable graph breaks with Float8Tensor at the boundary.

This may be blocking composability of Float8Linear + FSDP + torch.compile.

Test Plan:

pytest test/test_compile.py -s --sw -k graph_break
// currently broken
// logs: https://gist.github.com/vkuzo/53e92dfd91879c0710975b8a657fd91d

pytest test/test_compile.py -s --sw -k test_float8_graph_input 2>&1 | with-proxy gh gist create
// works!
// logs: https://gist.github.com/vkuzo/7a4c7252010157aa49f48a24046ec014

pytest test/test_compile.py -s --sw -k test_float8_graph_output 2>&1 | with-proxy gh gist create
// broken, has fake tensors when it should have real tensors
// logs: https://gist.github.com/vkuzo/35230bb05902733a58c19a7e1f04da45

Reviewers:

Subscribers:

Tasks:

Tags:

drisspg · 2023-12-19T21:41:31Z

Since I think that this might be hard to solve, should we make an issue tracking this and xfail the test?

Summary: This adds a couple of config options to unbreak autocast + compile + FSDP + Float8Linear. To enable these options, the user needs to do: ``` config.enable_amax_init = False config.enable_pre_and_post_forward = False ``` The `enable_amax_init` config adds the option to disable amax initialization. The reason this is currently broken is: 1. FSDP is not full-graph friendly (regardless of compile) 2. the amax init function has a graph break in distributed code because it uses inplace distributed collectives. I did try to use functional collectives, but that ran into numerical issues with compile, so for now just working around it. 3. graph breaks in Float8Linear code are not supported because of the issue documented in #166 4. so, as a workaround for all of the above, we just skip amax init for now. We do know from NVIDIA that this path is not needed for model convergence, and TE does not support this at all. It was nice for testing but not necessary for training jobs. The second config option disables pre-forward and post-forward. I don't have a repro in a unit test for now, but this does unbreak LLaMa 7B on 8 GPUs with FSDP + compile. Specifically, the thing which is broken in pre-forward/post-forward is assignment on module attributes. My hunch is that this graph breaks if autocast + FSDP are on, and graph breaks are not supported due to (3) above. Test Plan: ``` // unit / integration tests with-proxy test/test_everything.sh // run the LLaMa 7b trainer on 8 GPUs with autocast + compile + FSDP + Float8Linear, no compile errors ``` Reviewers: Subscribers: Tasks: Tags:

Summary: This adds a couple of config options to unbreak autocast + compile + FSDP + Float8Linear. To enable these options, the user needs to do: ``` config.enable_amax_init = False config.enable_pre_and_post_forward = False ``` The `enable_amax_init` config adds the option to disable amax initialization. The reason this is currently broken is: 1. FSDP is not full-graph friendly (regardless of compile) 2. the amax init function has a graph break in distributed code because it uses inplace distributed collectives. I did try to use functional collectives (#171), but that ran into numerical issues with compile, so for now just working around it. 3. graph breaks in Float8Linear code are not supported because of the issue documented in #166 4. so, as a workaround for all of the above, we just skip amax init for now. We do know from NVIDIA that this path is not needed for model convergence, and TE does not support this at all. It was nice for testing but not necessary for training jobs. The second config option disables pre-forward and post-forward. I don't have a repro in a unit test for now, but this does unbreak LLaMa 7B on 8 GPUs with FSDP + compile. Specifically, the thing which is broken in pre-forward/post-forward is assignment on module attributes. My hunch is that this graph breaks if autocast + FSDP are on, and graph breaks are not supported due to (3) above. Pull Request resolved: #172 Test Plan: ``` // unit / integration tests with-proxy test/test_everything.sh // run the LLaMa 7b trainer on 8 GPUs with autocast + compile + FSDP + Float8Linear, no compile errors ``` Reviewed By: drisspg Differential Revision: D52468625 Pulled By: vkuzo fbshipit-source-id: be4fac927b8520602ed018e96d7a49056e9c6e06

bdhirsh · 2024-01-10T15:27:50Z

It might be worth treating these two tests separately:

(1) a test with a graph break in the middle

(2) a test with no graph breaks, but where we manually pass a Float8Tensor as a user input (or return one as a user output)

That way we can tell if the problem is specific to "Float8Tensor as graph input", or if it some more subtle bug around graph breaks

vkuzo · 2024-01-10T15:39:11Z

opened pytorch/pytorch#117115 to track this in PyTorch repo

Summary: In https://github.com/pytorch/pytorch/pull/114311/files, the signature expected by traceable subclasses changed. This PR updates `Float8Tensor` to the new spec. Note: doesn't work yet, need to debug. This may be blocking composability of Float8Linear + FSDP + torch.compile. Test Plan: ``` pytest test/test_compile.py -s --sw -k graph_break // currently broken // logs: https://gist.github.com/vkuzo/ba98a01a459fb9c966f167d8ecca1780 ``` Reviewers: Subscribers: Tasks: Tags:

vkuzo · 2024-01-10T16:05:57Z

It might be worth treating these two tests separately:

(1) a test with a graph break in the middle

(2) a test with no graph breaks, but where we manually pass a Float8Tensor as a user input (or return one as a user output)

That way we can tell if the problem is specific to "Float8Tensor as graph input", or if it some more subtle bug around graph breaks

@bdhirsh , good idea, I added testing for input and output and added the results to the summary The "float8tensor as output" case is fishy, we see fake tensors.

Summary: This is a copy with some tweaks of: #166 Pull Request resolved: #196 Reviewed By: bdhirsh Differential Revision: D53056829 Pulled By: drisspg fbshipit-source-id: 1856d2e4bb1410161e326795014a56d0f91249e0

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 19, 2023

vkuzo requested a review from bdhirsh December 19, 2023 16:11

vkuzo mentioned this pull request Dec 19, 2023

support float8 weight caching for gradient accumulation/PP #164

Closed

vkuzo mentioned this pull request Dec 27, 2023

[wip] add option to do activation/grad cast from hooks #170

Open

vkuzo mentioned this pull request Dec 31, 2023

enable autocast + compile + FSDP + Float8Linear #172

Closed

vkuzo force-pushed the 20231219_graph_break branch from 94d5188 to 3d2a32e Compare January 10, 2024 15:21

vkuzo force-pushed the 20231219_graph_break branch from 3d2a32e to 37ab944 Compare January 10, 2024 15:33

vkuzo mentioned this pull request Jan 10, 2024

request: torch.compile support for tensor subclass subgraph boundaries pytorch/pytorch#117115

Open

vkuzo force-pushed the 20231219_graph_break branch from 37ab944 to 03149ef Compare January 10, 2024 16:04

vkuzo changed the title ~~[doesn't work yet] enable Float8Tensor as subgraph boundary~~ [wip] enable Float8Tensor as subgraph boundary Jan 16, 2024

vkuzo mentioned this pull request Jan 16, 2024

upcoming feature tracker #187

Closed

This was referenced Jan 24, 2024

Top Level Torch Compile Issue Tracker #195

Closed

Add tests for Float8Tensor at graph boundaries #196

Closed

drisspg closed this Jan 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[wip] enable Float8Tensor as subgraph boundary #166

[wip] enable Float8Tensor as subgraph boundary #166

Uh oh!

vkuzo commented Dec 19, 2023 •

edited

Loading

Uh oh!

drisspg commented Dec 19, 2023

Uh oh!

bdhirsh commented Jan 10, 2024

Uh oh!

vkuzo commented Jan 10, 2024

Uh oh!

vkuzo commented Jan 10, 2024

Uh oh!

Uh oh!

[wip] enable Float8Tensor as subgraph boundary #166

[wip] enable Float8Tensor as subgraph boundary #166

Uh oh!

Conversation

vkuzo commented Dec 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

drisspg commented Dec 19, 2023

Uh oh!

bdhirsh commented Jan 10, 2024

Uh oh!

vkuzo commented Jan 10, 2024

Uh oh!

vkuzo commented Jan 10, 2024

Uh oh!

Uh oh!

vkuzo commented Dec 19, 2023 •

edited

Loading