Skip to content
This repository was archived by the owner on Aug 7, 2024. It is now read-only.

switch from just-in-time scaling to delayed scaling #18

Merged
merged 1 commit into from
Aug 7, 2023
Merged

Conversation

vkuzo
Copy link
Contributor

@vkuzo vkuzo commented Aug 7, 2023

Summary:

Before: all scaling was done just-in-time
After:

  1. scaling is done in a delayed fashion with a history of 1
  2. there is special logic to populate initial amaxes (TE doesn't have this)

A future PR will add windowed calculation

Test Plan:

with-proxy ./tests/test_everything.sh

Reviewers:

Subscribers:

Tasks:

Tags:

Summary:

Before: all scaling was done just-in-time
After:
1. scaling is done in a delayed fashion with a history of 1
2. there is special logic to populate initial amaxes (TE doesn't have
this)

A future PR will add windowed calculation

Test Plan:

```
with-proxy ./tests/test_everything.sh
```

Reviewers:

Subscribers:

Tasks:

Tags:
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 7, 2023
@vkuzo vkuzo merged commit 145f31a into main Aug 7, 2023
vkuzo added a commit that referenced this pull request Aug 14, 2023
Summary:

In #18, the
MNIST finetuning script broke because the casts were not saturated.

By default, casts to float8 are not saturated. With delayed scaling,
we need to saturate to avoid `NaN`s everywhere.  For now, write
the saturation logic in eager mode.

In the future we would ideally lower this to a hardware accelerated
saturated cast via PT2.0.

Test Plan:

```
// loss now converges, again
with-proxy python finetune/mnist.py --batch-size 4096 --use-pt-fp8
```

Reviewers:

Subscribers:

Tasks:

Tags:
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants