sasarkar/fusedrope inp bf16 #1026

ssarkar2 · 2024-05-30T23:35:40Z

What does this PR do?

pulling in this: HabanaAI@1d44433

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

HuggingFaceDocBuilderDev · 2024-05-30T23:39:35Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

regisss

Have you checked the accuracy of the trained models with this change? RoPE is quite sensitive to the data type that is used to compute it.

ssarkar2 · 2024-06-05T22:30:38Z

for a small text gen expt:

python run_generation.py --model_name_or_path /mnt/weka/data/llama_inference/Llama-2-7b-chat-hf/ --use_kv_cache --max_new_tokens 100 --bf16 --batch_size 4  --use_hpu_graph --trim_logits --bucket_size 128 --bucket_internal --reuse_cache --use_flash_attention

 python run_generation.py --model_name_or_path /mnt/weka/data/llama_inference/Llama-2-7b-chat-hf/ --use_kv_cache --max_new_tokens 1024 --max_input_tokens 1024 --bf16 --batch_size 4  --use_hpu_graph --trim_logits --bucket_size 128 --bucket_internal --reuse_cache --use_flash_attention

Got exactly same outputs

Training:

python3 run_lora_clm.py     --model_name_or_path pathto/Llama-2-7b-chat-hf/     --dataset_name tatsu-lab/alpaca     --bf16 True     --output_dir ./model_lora_llama     --num_train_epochs 3     --per_device_train_batch_size 16     --evaluation_strategy "no"     --save_strategy "no"     --learning_rate 1e-4     --warmup_ratio  0.03     --lr_scheduler_type "constant"     --max_grad_norm  0.3     --logging_steps 1     --do_train     --do_eval     --use_habana     --use_lazy_mode     --throughput_warmup_steps 3     --lora_rank=8     --lora_alpha=16     --lora_dropout=0.05     --lora_target_modules "q_proj" "v_proj"     --dataset_concatenation     --max_seq_length 512     --low_cpu_mem_usage True     --validation_split_percentage 4     --adam_epsilon 1e-08

main

	***** train metrics *****
  epoch                       =         3.0
  max_memory_allocated (GB)   =       74.26
  memory_allocated (GB)       =       57.39
  total_flos                  = 703666335GF
  total_memory_available (GB) =       94.62
  train_loss                  =      0.9154
  train_runtime               =  0:32:29.62
  train_samples_per_second    =      19.317
  train_steps_per_second      =       1.208
06/05/2024 23:11:20 - INFO - __main__ -   *** Evaluate ***
[INFO|trainer.py:1779] 2024-06-05 23:11:20,349 >> ***** Running Evaluation *****
[INFO|trainer.py:1781] 2024-06-05 23:11:20,349 >>   Num examples = 501
[INFO|trainer.py:1784] 2024-06-05 23:11:20,349 >>   Batch size = 8
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 63/63 [00:12<00:00,  5.05it/s]
***** eval metrics *****
  epoch                       =        3.0
  eval_accuracy               =     0.7744
  eval_loss                   =     0.8599
  eval_runtime                = 0:00:13.30
  eval_samples                =        501
  eval_samples_per_second     =     38.681
  eval_steps_per_second       =      4.866
  max_memory_allocated (GB)   =      94.61
  memory_allocated (GB)       =      57.39
  perplexity                  =     2.3629
  total_memory_available (GB) =      94.62

with branch

***** train metrics *****
  epoch                       =         3.0
  max_memory_allocated (GB)   =       74.26
  memory_allocated (GB)       =       57.39
  total_flos                  = 703666335GF
  total_memory_available (GB) =       94.62
  train_loss                  =      0.9154
  train_runtime               =  0:32:21.30
  train_samples_per_second    =      19.383
  train_steps_per_second      =       1.213
06/05/2024 23:23:21 - INFO - __main__ -   *** Evaluate ***
[INFO|trainer.py:1779] 2024-06-05 23:23:21,480 >> ***** Running Evaluation *****
[INFO|trainer.py:1781] 2024-06-05 23:23:21,480 >>   Num examples = 501
[INFO|trainer.py:1784] 2024-06-05 23:23:21,480 >>   Batch size = 8
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 63/63 [00:12<00:00,  5.10it/s]
***** eval metrics *****
  epoch                       =        3.0
  eval_accuracy               =     0.7744
  eval_loss                   =     0.8599
  eval_runtime                = 0:00:13.44
  eval_samples                =        501
  eval_samples_per_second     =     39.102
  eval_steps_per_second       =      4.918
  max_memory_allocated (GB)   =      94.61
  memory_allocated (GB)       =      57.39
  perplexity                  =     2.3629
  total_memory_available (GB) =      94.62

So looks same with the branch. There are also other internal tests and we havent seen accuracy issues

fusedrope inp bf16

638d1f8

ssarkar2 requested review from mandy-li and libinta as code owners May 30, 2024 23:35

ssarkar2 requested a review from a user May 30, 2024 23:35

style

d4f854f

ssarkar2 added the synapse1.16 label May 31, 2024

hsubramony added a commit that referenced this pull request May 31, 2024

sasarkar/fusedrope inp bf16 #1026

7713a65

regisss reviewed Jun 5, 2024

View reviewed changes

ssarkar2 requested a review from regisss June 5, 2024 23:27

regisss approved these changes Jun 6, 2024

View reviewed changes

regisss merged commit 13a60b5 into main Jun 6, 2024
14 checks passed

regisss deleted the sasarkar/fusedrope_inp_bf16 branch June 6, 2024 09:11

astachowiczhabana mentioned this pull request Jun 12, 2024

set all fusedrope inputs to bf16 HabanaAI/optimum-habana-fork#140

Merged

imangohari1 pushed a commit to imangohari1/optimum-habana that referenced this pull request Jun 13, 2024

Fusedrope inp bf16 (huggingface#1026)

d3a65d0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sasarkar/fusedrope inp bf16 #1026

sasarkar/fusedrope inp bf16 #1026

ssarkar2 commented May 30, 2024

HuggingFaceDocBuilderDev commented May 30, 2024

regisss left a comment

ssarkar2 commented Jun 5, 2024 •

edited

Loading

sasarkar/fusedrope inp bf16 #1026

sasarkar/fusedrope inp bf16 #1026

Conversation

ssarkar2 commented May 30, 2024

What does this PR do?

Before submitting

HuggingFaceDocBuilderDev commented May 30, 2024

regisss left a comment

Choose a reason for hiding this comment

ssarkar2 commented Jun 5, 2024 • edited Loading

ssarkar2 commented Jun 5, 2024 •

edited

Loading