-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bfloat16 support, and an attempt at homogenizing model_dtype & precision #54
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…kpoint at inference
93158fe enables ![]() ![]() |
TODO
|
for xlm-roberta-xl(xxl) which are natively fp32, I added this here: eole/eole/bin/convert/convert_HF.py Line 861 in 166a18b
to convert them to fp16 I think since we can convert any kind of model (more and more are in bf16) maybe by default we can keep the original dtype but we can add a flag to force the storage in another dtype. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
bfloat16
X = steps

X = relative time

It seems to work relatively plug-n-play, but we might need to adapt a few things optimizer-wise:
We might investigate some bf16-specific implementations, e.g. https://github.com/arogozhnikov/adamw_bfloat16
precision // model_dtype homogenization
Previously,
model_dtype
is used for training, with some "precision" deduced and applied depending on some other settings (optimizer), andprecision
is set inPredictConfig
for inference. This PR proposes a factorization ofprecision
at the commonRunningConfig
level, anddtype
(actual dtype the model is cast to for training),is deduced with the same conditions as before.TODOs: