Skip to content

float8: remove unneeded kernel for scale generation #616

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 7, 2024
Merged

Conversation

vkuzo
Copy link
Contributor

@vkuzo vkuzo commented Aug 6, 2024

Summary:

The code to create a float8 scale is unnecessarily creating an extra GPU
kernel launch by calling torch.empty, removing this.

There is no performance impact, but it does make things easier to debug by reducing log size / making GPU traces simpler.

Test Plan:

// extract trace of a linear fwd+bwd with
python benchmarks/float8/profile_linear_float8.py ~/local/tmp/test
// verify that the GPU kernel creating an empty scale tensor is no longer there

// unit tests pass
./test/float8/test_everything.sh

Reviewers:

Subscribers:

Tasks:

Tags:

vkuzo added 2 commits August 6, 2024 14:29
[ghstack-poisoned]
[ghstack-poisoned]
@vkuzo
Copy link
Contributor Author

vkuzo commented Aug 6, 2024

Copy link

pytorch-bot bot commented Aug 6, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/616

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit bb3106d with merge base 8bba8ed (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 6, 2024
vkuzo added a commit that referenced this pull request Aug 6, 2024
Summary:

The code to create a float8 scale is unnecessarily creating an extra GPU
kernel launch by calling `torch.empty`, removing this.

Test Plan:

```
// extract trace of a linear fwd+bwd with
python benchmarks/float8/profile_linear_float8.py ~/local/tmp/test
// verify that the GPU kernel creating an empty scale tensor is no longer there

// unit tests pass
./test/float8/test_everything.sh
```

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 8be2b5d
ghstack-comment-id: 2272205849
Pull Request resolved: #616
@vkuzo vkuzo requested review from drisspg and y-sq August 6, 2024 21:46
[ghstack-poisoned]
@vkuzo vkuzo changed the base branch from gh/vkuzo/6/head to main August 7, 2024 15:52
@vkuzo vkuzo merged commit d582f9a into main Aug 7, 2024
13 checks passed
jainapurva pushed a commit that referenced this pull request Aug 7, 2024
Summary:

The code to create a float8 scale is unnecessarily creating an extra GPU
kernel launch by calling `torch.empty`, removing this.

There is no performance impact, but it does make things easier to debug by reducing log size / making GPU traces simpler.

Test Plan:

```
// extract trace of a linear fwd+bwd with
python benchmarks/float8/profile_linear_float8.py ~/local/tmp/test
// verify that the GPU kernel creating an empty scale tensor is no longer there

// unit tests pass
./test/float8/test_everything.sh
```

Reviewers:

Subscribers:

Tasks:

Tags:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants