Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sasarkar/fusedrope inp bf16 #1026

Merged
merged 2 commits into from
Jun 6, 2024
Merged

sasarkar/fusedrope inp bf16 #1026

merged 2 commits into from
Jun 6, 2024

Conversation

ssarkar2
Copy link
Collaborator

What does this PR do?

pulling in this: HabanaAI@1d44433

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@ssarkar2 ssarkar2 requested review from mandy-li and libinta as code owners May 30, 2024 23:35
@ssarkar2 ssarkar2 requested a review from a user May 30, 2024 23:35
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

hsubramony added a commit that referenced this pull request May 31, 2024
Copy link
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you checked the accuracy of the trained models with this change? RoPE is quite sensitive to the data type that is used to compute it.

@ssarkar2
Copy link
Collaborator Author

ssarkar2 commented Jun 5, 2024

for a small text gen expt:

python run_generation.py --model_name_or_path /mnt/weka/data/llama_inference/Llama-2-7b-chat-hf/ --use_kv_cache --max_new_tokens 100 --bf16 --batch_size 4  --use_hpu_graph --trim_logits --bucket_size 128 --bucket_internal --reuse_cache --use_flash_attention
 python run_generation.py --model_name_or_path /mnt/weka/data/llama_inference/Llama-2-7b-chat-hf/ --use_kv_cache --max_new_tokens 1024 --max_input_tokens 1024 --bf16 --batch_size 4  --use_hpu_graph --trim_logits --bucket_size 128 --bucket_internal --reuse_cache --use_flash_attention

Got exactly same outputs

Training:

python3 run_lora_clm.py     --model_name_or_path pathto/Llama-2-7b-chat-hf/     --dataset_name tatsu-lab/alpaca     --bf16 True     --output_dir ./model_lora_llama     --num_train_epochs 3     --per_device_train_batch_size 16     --evaluation_strategy "no"     --save_strategy "no"     --learning_rate 1e-4     --warmup_ratio  0.03     --lr_scheduler_type "constant"     --max_grad_norm  0.3     --logging_steps 1     --do_train     --do_eval     --use_habana     --use_lazy_mode     --throughput_warmup_steps 3     --lora_rank=8     --lora_alpha=16     --lora_dropout=0.05     --lora_target_modules "q_proj" "v_proj"     --dataset_concatenation     --max_seq_length 512     --low_cpu_mem_usage True     --validation_split_percentage 4     --adam_epsilon 1e-08

main
	***** train metrics *****
  epoch                       =         3.0
  max_memory_allocated (GB)   =       74.26
  memory_allocated (GB)       =       57.39
  total_flos                  = 703666335GF
  total_memory_available (GB) =       94.62
  train_loss                  =      0.9154
  train_runtime               =  0:32:29.62
  train_samples_per_second    =      19.317
  train_steps_per_second      =       1.208
06/05/2024 23:11:20 - INFO - __main__ -   *** Evaluate ***
[INFO|trainer.py:1779] 2024-06-05 23:11:20,349 >> ***** Running Evaluation *****
[INFO|trainer.py:1781] 2024-06-05 23:11:20,349 >>   Num examples = 501
[INFO|trainer.py:1784] 2024-06-05 23:11:20,349 >>   Batch size = 8
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 63/63 [00:12<00:00,  5.05it/s]
***** eval metrics *****
  epoch                       =        3.0
  eval_accuracy               =     0.7744
  eval_loss                   =     0.8599
  eval_runtime                = 0:00:13.30
  eval_samples                =        501
  eval_samples_per_second     =     38.681
  eval_steps_per_second       =      4.866
  max_memory_allocated (GB)   =      94.61
  memory_allocated (GB)       =      57.39
  perplexity                  =     2.3629
  total_memory_available (GB) =      94.62

with branch

***** train metrics *****
  epoch                       =         3.0
  max_memory_allocated (GB)   =       74.26
  memory_allocated (GB)       =       57.39
  total_flos                  = 703666335GF
  total_memory_available (GB) =       94.62
  train_loss                  =      0.9154
  train_runtime               =  0:32:21.30
  train_samples_per_second    =      19.383
  train_steps_per_second      =       1.213
06/05/2024 23:23:21 - INFO - __main__ -   *** Evaluate ***
[INFO|trainer.py:1779] 2024-06-05 23:23:21,480 >> ***** Running Evaluation *****
[INFO|trainer.py:1781] 2024-06-05 23:23:21,480 >>   Num examples = 501
[INFO|trainer.py:1784] 2024-06-05 23:23:21,480 >>   Batch size = 8
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 63/63 [00:12<00:00,  5.10it/s]
***** eval metrics *****
  epoch                       =        3.0
  eval_accuracy               =     0.7744
  eval_loss                   =     0.8599
  eval_runtime                = 0:00:13.44
  eval_samples                =        501
  eval_samples_per_second     =     39.102
  eval_steps_per_second       =      4.918
  max_memory_allocated (GB)   =      94.61
  memory_allocated (GB)       =      57.39
  perplexity                  =     2.3629
  total_memory_available (GB) =      94.62

So looks same with the branch. There are also other internal tests and we havent seen accuracy issues

@ssarkar2 ssarkar2 requested a review from regisss June 5, 2024 23:27
@regisss regisss merged commit 13a60b5 into main Jun 6, 2024
14 checks passed
@regisss regisss deleted the sasarkar/fusedrope_inp_bf16 branch June 6, 2024 09:11
imangohari1 pushed a commit to imangohari1/optimum-habana that referenced this pull request Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants