Skip to content

Make learning rate tensor (Backend) #3287

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

spcyppt
Copy link
Contributor

@spcyppt spcyppt commented Oct 29, 2024

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/386

Context problem from Microve:
pt2 adds a guard on the float inputs, and if it is changed, it will be recompiled. Because compilation itself is expensive, each recompilation could take several minutes to >20 mins.
In e2e training, there is a warm up stage where the learning rate is gradually increased to a pre-defined value
e.g., say the final learning rate is 0.02 and the warm step is 10k, learning rate will increase from 0 to 0.02 with a step 0.00002 (each iteration, it increases by 0.00002). So, if we let pt2 recompile, it will recompile 10k times.
For a tensor, the guard is only on its shape; if its shape remains the same, it will not trigger recompilation.


To prevent recompilation, we change learning rate from float to tensor.
This, however, affects existing TBE frontend and backend.

We will enable learning rate being tensor through the new unified interface (D50481991).
For backward compatibility, the old interface (V1), i.e., split_embedding_codegen_lookup_{{ optimizer }}_function and split_embedding_codegen_lookup_{{ optimizer }}_function_cpu will continue to take learning rate as float.

This diff

  • make learning rate tensor in codegen
  • keep learning rate as float for kernel arguments
  • create optional argument to OptimizerArgs for v1 signature
  • make old interface takes tensor as float and converts to tensor before passing to autograd
  • converts learning rate back to float before passing to kernels

Old interface:

          python -> C++ lookup -> autograd -> backend -> kernel
lr type:  (float)     (float)     (tensor)   (tensor)   (float)

PT2 unified interface (D50481991):

          python -> C++ lookup -> autograd -> backend -> kernel
lr type:  (tensor)   (tensor)     (tensor)   (tensor)   (float)

Differential Revision: D62784577

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D62784577

Copy link

netlify bot commented Oct 29, 2024

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit 304ba8e
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/6723fd7427a57200084d7415
😎 Deploy Preview https://deploy-preview-3287--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

spcyppt added a commit to spcyppt/FBGEMM that referenced this pull request Oct 29, 2024
Summary:

X-link: facebookresearch/FBGEMM#386

Context problem from Microve:
pt2 adds a guard on the float inputs, and if it is changed, it will be recompiled. Because compilation itself is expensive, each recompilation could take several minutes to >20 mins. 
In e2e training, there is a warm up stage where the learning rate is gradually increased to a pre-defined value
e.g., say the final learning rate is 0.02 and the warm step is 10k, learning rate will increase from 0 to 0.02 with a step 0.00002 (each iteration, it increases by 0.00002). So, if we let pt2 recompile, it will recompile 10k times.
For a tensor, the guard is only on its shape; if its shape remains the same, it will not trigger recompilation.

----
To prevent recompilation, we change learning rate from float to tensor.
This, however, affects existing TBE frontend and backend.

We will enable learning rate being tensor through the new unified interface (D50481991).
For backward compatibility, the old interface (V1), i.e., `split_embedding_codegen_lookup_{{ optimizer }}_function` and `split_embedding_codegen_lookup_{{ optimizer }}_function_cpu` will continue to take learning rate as `float`.

This diff
- make learning rate tensor in codegen
- keep learning rate as float for kernel arguments
- create optional argument to OptimizerArgs for v1 signature
- make old interface takes tensor as float and converts to tensor before passing to autograd
- converts learning rate back to float before passing to kernels

Old interface:
```
          python -> C++ lookup -> autograd -> backend -> kernel
lr type:  (float)     (float)     (tensor)   (tensor)   (float)
```

PT2 unified interface (D50481991):
```
          python -> C++ lookup -> autograd -> backend -> kernel
lr type:  (tensor)   (tensor)     (tensor)   (tensor)   (float)
```

Reviewed By: q10

Differential Revision: D62784577
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D62784577

Summary:

X-link: facebookresearch/FBGEMM#386

Context problem from Microve:
pt2 adds a guard on the float inputs, and if it is changed, it will be recompiled. Because compilation itself is expensive, each recompilation could take several minutes to >20 mins. 
In e2e training, there is a warm up stage where the learning rate is gradually increased to a pre-defined value
e.g., say the final learning rate is 0.02 and the warm step is 10k, learning rate will increase from 0 to 0.02 with a step 0.00002 (each iteration, it increases by 0.00002). So, if we let pt2 recompile, it will recompile 10k times.
For a tensor, the guard is only on its shape; if its shape remains the same, it will not trigger recompilation.

----
To prevent recompilation, we change learning rate from float to tensor.
This, however, affects existing TBE frontend and backend.

We will enable learning rate being tensor through the new unified interface (D50481991).
For backward compatibility, the old interface (V1), i.e., `split_embedding_codegen_lookup_{{ optimizer }}_function` and `split_embedding_codegen_lookup_{{ optimizer }}_function_cpu` will continue to take learning rate as `float`.

This diff
- make learning rate tensor in codegen
- keep learning rate as float for kernel arguments
- create optional argument to OptimizerArgs for v1 signature
- make old interface takes tensor as float and converts to tensor before passing to autograd
- converts learning rate back to float before passing to kernels

Old interface:
```
          python -> C++ lookup -> autograd -> backend -> kernel
lr type:  (float)     (float)     (tensor)   (tensor)   (float)
```

PT2 unified interface (D50481991):
```
          python -> C++ lookup -> autograd -> backend -> kernel
lr type:  (tensor)   (tensor)     (tensor)   (tensor)   (float)
```

Reviewed By: q10

Differential Revision: D62784577
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D62784577

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in fc822f2.

@facebook-github-bot
Copy link
Contributor

This pull request has been reverted by dab5144.

q10 pushed a commit to q10/FBGEMM that referenced this pull request Apr 10, 2025
Summary:
X-link: pytorch#3287

Pull Request resolved: facebookresearch/FBGEMM#386

Context problem from Microve:
pt2 adds a guard on the float inputs, and if it is changed, it will be recompiled. Because compilation itself is expensive, each recompilation could take several minutes to >20 mins.
In e2e training, there is a warm up stage where the learning rate is gradually increased to a pre-defined value
e.g., say the final learning rate is 0.02 and the warm step is 10k, learning rate will increase from 0 to 0.02 with a step 0.00002 (each iteration, it increases by 0.00002). So, if we let pt2 recompile, it will recompile 10k times.
For a tensor, the guard is only on its shape; if its shape remains the same, it will not trigger recompilation.

----
To prevent recompilation, we change learning rate from float to tensor.
This, however, affects existing TBE frontend and backend.

We will enable learning rate being tensor through the new unified interface (D50481991).
For backward compatibility, the old interface (V1), i.e., `split_embedding_codegen_lookup_{{ optimizer }}_function` and `split_embedding_codegen_lookup_{{ optimizer }}_function_cpu` will continue to take learning rate as `float`.

This diff
- make learning rate tensor in codegen
- keep learning rate as float for kernel arguments
- create optional argument to OptimizerArgs for v1 signature
- make old interface takes tensor as float and converts to tensor before passing to autograd
- converts learning rate back to float before passing to kernels

Old interface:
```
          python -> C++ lookup -> autograd -> backend -> kernel
lr type:  (float)     (float)     (tensor)   (tensor)   (float)
```

PT2 unified interface (D50481991):
```
          python -> C++ lookup -> autograd -> backend -> kernel
lr type:  (tensor)   (tensor)     (tensor)   (tensor)   (float)
```

Reviewed By: q10, egienvalue

Differential Revision: D62784577

fbshipit-source-id: 11e43f1103ceab6220d1736b68df5d581a17c7fc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants