-
Notifications
You must be signed in to change notification settings - Fork 617
Make learning rate
tensor (Backend)
#3287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This pull request was exported from Phabricator. Differential Revision: D62784577 |
✅ Deploy Preview for pytorch-fbgemm-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
Summary: X-link: facebookresearch/FBGEMM#386 Context problem from Microve: pt2 adds a guard on the float inputs, and if it is changed, it will be recompiled. Because compilation itself is expensive, each recompilation could take several minutes to >20 mins. In e2e training, there is a warm up stage where the learning rate is gradually increased to a pre-defined value e.g., say the final learning rate is 0.02 and the warm step is 10k, learning rate will increase from 0 to 0.02 with a step 0.00002 (each iteration, it increases by 0.00002). So, if we let pt2 recompile, it will recompile 10k times. For a tensor, the guard is only on its shape; if its shape remains the same, it will not trigger recompilation. ---- To prevent recompilation, we change learning rate from float to tensor. This, however, affects existing TBE frontend and backend. We will enable learning rate being tensor through the new unified interface (D50481991). For backward compatibility, the old interface (V1), i.e., `split_embedding_codegen_lookup_{{ optimizer }}_function` and `split_embedding_codegen_lookup_{{ optimizer }}_function_cpu` will continue to take learning rate as `float`. This diff - make learning rate tensor in codegen - keep learning rate as float for kernel arguments - create optional argument to OptimizerArgs for v1 signature - make old interface takes tensor as float and converts to tensor before passing to autograd - converts learning rate back to float before passing to kernels Old interface: ``` python -> C++ lookup -> autograd -> backend -> kernel lr type: (float) (float) (tensor) (tensor) (float) ``` PT2 unified interface (D50481991): ``` python -> C++ lookup -> autograd -> backend -> kernel lr type: (tensor) (tensor) (tensor) (tensor) (float) ``` Reviewed By: q10 Differential Revision: D62784577
c69cd9b
to
302de99
Compare
This pull request was exported from Phabricator. Differential Revision: D62784577 |
Summary: X-link: facebookresearch/FBGEMM#386 Context problem from Microve: pt2 adds a guard on the float inputs, and if it is changed, it will be recompiled. Because compilation itself is expensive, each recompilation could take several minutes to >20 mins. In e2e training, there is a warm up stage where the learning rate is gradually increased to a pre-defined value e.g., say the final learning rate is 0.02 and the warm step is 10k, learning rate will increase from 0 to 0.02 with a step 0.00002 (each iteration, it increases by 0.00002). So, if we let pt2 recompile, it will recompile 10k times. For a tensor, the guard is only on its shape; if its shape remains the same, it will not trigger recompilation. ---- To prevent recompilation, we change learning rate from float to tensor. This, however, affects existing TBE frontend and backend. We will enable learning rate being tensor through the new unified interface (D50481991). For backward compatibility, the old interface (V1), i.e., `split_embedding_codegen_lookup_{{ optimizer }}_function` and `split_embedding_codegen_lookup_{{ optimizer }}_function_cpu` will continue to take learning rate as `float`. This diff - make learning rate tensor in codegen - keep learning rate as float for kernel arguments - create optional argument to OptimizerArgs for v1 signature - make old interface takes tensor as float and converts to tensor before passing to autograd - converts learning rate back to float before passing to kernels Old interface: ``` python -> C++ lookup -> autograd -> backend -> kernel lr type: (float) (float) (tensor) (tensor) (float) ``` PT2 unified interface (D50481991): ``` python -> C++ lookup -> autograd -> backend -> kernel lr type: (tensor) (tensor) (tensor) (tensor) (float) ``` Reviewed By: q10 Differential Revision: D62784577
302de99
to
304ba8e
Compare
This pull request was exported from Phabricator. Differential Revision: D62784577 |
This pull request has been merged in fc822f2. |
This pull request has been reverted by dab5144. |
Summary: X-link: pytorch#3287 Pull Request resolved: facebookresearch/FBGEMM#386 Context problem from Microve: pt2 adds a guard on the float inputs, and if it is changed, it will be recompiled. Because compilation itself is expensive, each recompilation could take several minutes to >20 mins. In e2e training, there is a warm up stage where the learning rate is gradually increased to a pre-defined value e.g., say the final learning rate is 0.02 and the warm step is 10k, learning rate will increase from 0 to 0.02 with a step 0.00002 (each iteration, it increases by 0.00002). So, if we let pt2 recompile, it will recompile 10k times. For a tensor, the guard is only on its shape; if its shape remains the same, it will not trigger recompilation. ---- To prevent recompilation, we change learning rate from float to tensor. This, however, affects existing TBE frontend and backend. We will enable learning rate being tensor through the new unified interface (D50481991). For backward compatibility, the old interface (V1), i.e., `split_embedding_codegen_lookup_{{ optimizer }}_function` and `split_embedding_codegen_lookup_{{ optimizer }}_function_cpu` will continue to take learning rate as `float`. This diff - make learning rate tensor in codegen - keep learning rate as float for kernel arguments - create optional argument to OptimizerArgs for v1 signature - make old interface takes tensor as float and converts to tensor before passing to autograd - converts learning rate back to float before passing to kernels Old interface: ``` python -> C++ lookup -> autograd -> backend -> kernel lr type: (float) (float) (tensor) (tensor) (float) ``` PT2 unified interface (D50481991): ``` python -> C++ lookup -> autograd -> backend -> kernel lr type: (tensor) (tensor) (tensor) (tensor) (float) ``` Reviewed By: q10, egienvalue Differential Revision: D62784577 fbshipit-source-id: 11e43f1103ceab6220d1736b68df5d581a17c7fc
Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/386
Context problem from Microve:
pt2 adds a guard on the float inputs, and if it is changed, it will be recompiled. Because compilation itself is expensive, each recompilation could take several minutes to >20 mins.
In e2e training, there is a warm up stage where the learning rate is gradually increased to a pre-defined value
e.g., say the final learning rate is 0.02 and the warm step is 10k, learning rate will increase from 0 to 0.02 with a step 0.00002 (each iteration, it increases by 0.00002). So, if we let pt2 recompile, it will recompile 10k times.
For a tensor, the guard is only on its shape; if its shape remains the same, it will not trigger recompilation.
To prevent recompilation, we change learning rate from float to tensor.
This, however, affects existing TBE frontend and backend.
We will enable learning rate being tensor through the new unified interface (D50481991).
For backward compatibility, the old interface (V1), i.e.,
split_embedding_codegen_lookup_{{ optimizer }}_function
andsplit_embedding_codegen_lookup_{{ optimizer }}_function_cpu
will continue to take learning rate asfloat
.This diff
Old interface:
PT2 unified interface (D50481991):
Differential Revision: D62784577