bfloat16 support, and an attempt at homogenizing model_dtype & precision #54

francoishernandez · 2024-07-03T15:31:04Z

bfloat16

X = steps

X = relative time

It seems to work relatively plug-n-play, but we might need to adapt a few things optimizer-wise:

fusedadam does not seem supported;
loss lags a bit behind compared to fp16/fp32

We might investigate some bf16-specific implementations, e.g. https://github.com/arogozhnikov/adamw_bfloat16

precision // model_dtype homogenization

Previously, model_dtype is used for training, with some "precision" deduced and applied depending on some other settings (optimizer), and precision is set in PredictConfig for inference. This PR proposes a factorization of precision at the common RunningConfig level, and dtype (actual dtype the model is cast to for training),is deduced with the same conditions as before.

TODOs:

check refactoring did not break inference;
clarify int8 specific case handling (-> done via dtype computed_field);
investigate bf16 optimization;
add some validation if needed (e.g. fusedadam + bf16 incompatibility)
add some docs/FAQ page with various precision/dtype related specificities?

… to cleanup

…kpoint at inference

francoishernandez · 2024-07-04T14:39:32Z

93158fe enables amp for the bfloat16 case, which seems to work fine.

francoishernandez · 2024-07-04T14:48:20Z

TODO

rename precision to compute_dtype
rename dtype to storage_dtype (or model_dtype?)

vince62s · 2024-07-04T16:27:35Z

for xlm-roberta-xl(xxl) which are natively fp32, I added this here:

eole/eole/bin/convert/convert_HF.py

Line 861 in 166a18b

eole_safetensor[key] = eole_safetensor[key].to(torch.float16)

to convert them to fp16
I think since we can convert any kind of model (more and more are in bf16) maybe by default we can keep the original dtype but we can add a flag to force the storage in another dtype.

francoishernandez added 3 commits July 3, 2024 14:10

support bf16 model_dtype and precision

c2466f7

factorize precision in RunningConfig, deduce dtype, still some kwargs…

d8618c2

… to cleanup

cleanup some remaining model_dtype, fix recipes configs

3e25464

francoishernandez added enhancement New feature or request refactor Some refactoring, aesthetic or cleanup code changes labels Jul 3, 2024

francoishernandez added 9 commits July 3, 2024 22:56

validate self_attn_backend against fp16/bf16, use precision from chec…

c35fd8f

…kpoint at inference

validate fusedadam against fp16

49d1201

black and factorize count_parameters

b3f3db8

remove deprecated config settings in lm_prior codepath

c936448

remove some comments

5b660ab

simplify int8 logic, remove some arg/kwarg

bf396da

fix conflict

91ed867

enable amp for bfloat16

93158fe

add precision note to FAQ

9b85d32

francoishernandez changed the title ~~[WIP] bfloat16 support, and an attempt at homogenizing model_dtype & precision~~ bfloat16 support, and an attempt at homogenizing model_dtype & precision Jul 4, 2024

rename precision to compute_dtype and dtype to storage_dtype for clarity

166a18b

francoishernandez marked this pull request as ready for review July 4, 2024 15:21

francoishernandez added 4 commits July 5, 2024 11:07

enable dtype selection in convert_HF, TORCH_DTYPES mapping in constants

1134e75

rename precision -> compute_dtype in recipes

42bbe03

Merge branch 'main' of github.com:eole-nlp/eole into bf16_support

66bf22e

update cometkiwi recipe

7e1a532

francoishernandez merged commit 81318aa into main Jul 10, 2024
4 checks passed

vince62s mentioned this pull request Sep 11, 2024

Recipe stuck when predicting #99

Closed

francoishernandez mentioned this pull request Sep 18, 2024

[patch] Update precision to compute_dtype in forgotten places #106

Merged

francoishernandez deleted the bf16_support branch February 7, 2025 08:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bfloat16 support, and an attempt at homogenizing model_dtype & precision #54

bfloat16 support, and an attempt at homogenizing model_dtype & precision #54

francoishernandez commented Jul 3, 2024 •

edited

Loading

francoishernandez commented Jul 4, 2024

francoishernandez commented Jul 4, 2024 •

edited

Loading

vince62s commented Jul 4, 2024

bfloat16 support, and an attempt at homogenizing model_dtype & precision #54

bfloat16 support, and an attempt at homogenizing model_dtype & precision #54

Conversation

francoishernandez commented Jul 3, 2024 • edited Loading

bfloat16

precision // model_dtype homogenization

TODOs:

francoishernandez commented Jul 4, 2024

francoishernandez commented Jul 4, 2024 • edited Loading

TODO

vince62s commented Jul 4, 2024

francoishernandez commented Jul 3, 2024 •

edited

Loading

francoishernandez commented Jul 4, 2024 •

edited

Loading