Skip to content
This repository was archived by the owner on Aug 7, 2024. It is now read-only.

add numerical test for FSDP #9

Merged
merged 1 commit into from
Aug 2, 2023
Merged

add numerical test for FSDP #9

merged 1 commit into from
Aug 2, 2023

Conversation

vkuzo
Copy link
Contributor

@vkuzo vkuzo commented Aug 2, 2023

Summary:

Adds a test for numerical equivalence of single GPU vs FSDP for a toy model.

Note: this is not related to fp8 yet, a future PR will add a test that this still holds for fp8.

Test Plan:

./float8_playground/test_fsdp.sh

Reviewers:

Subscribers:

Tasks:

Tags:

Summary:

Adds a test for numerical equivalence of single GPU vs FSDP for a toy
model.

Note: this is not related to fp8 yet, a future PR will add a test that
this still holds for fp8.

Test Plan:

```
./float8_playground/test_fsdp.sh
```

Reviewers:

Subscribers:

Tasks:

Tags:
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 2, 2023
@vkuzo vkuzo merged commit c7deeed into main Aug 2, 2023
Copy link

@awgu awgu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable to me!


model = get_model(K, N).to(rank)
model.load_state_dict(torch.load(sd_in_fname))
model = FSDP(model)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For more realistic models, we should use nested FSDP wrapping to be representative of a real workload. This can be achieved by passing an auto_wrap_policy argument. Feel free to message me if you have a particular model and need some clarification on how to pass a good auto_wrap_policy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, thanks!

yep, that will come later, this is just a super simple test to iron out easy to catch issues.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants