Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment XPU QLora Finetuning #8937

Merged
merged 22 commits into from
Sep 19, 2023
Merged

Conversation

yangw1234
Copy link
Contributor

@yangw1234 yangw1234 commented Sep 10, 2023

Description

Example referenced https://colab.research.google.com/drive/1VoYNfYDKcKRQRor98Zbf2-9VQTtGJ24k?usp=sharing

Current output on llama2-13b (bs=4, gradient_acc_steps=1, warmup_steps=20, max_steps=200)

{'loss': 1.5875, 'learning_rate': 0.0002, 'epoch': 0.03}                                                                                                                                                                                            
{'loss': 1.2684, 'learning_rate': 0.00017777777777777779, 'epoch': 0.06}                                                                                                                                                                            
{'loss': 1.1366, 'learning_rate': 0.00015555555555555556, 'epoch': 0.1}                                                                                                                                                                             
{'loss': 1.046, 'learning_rate': 0.00013333333333333334, 'epoch': 0.13}                                                                                                                                                                             
{'loss': 0.877, 'learning_rate': 0.00011111111111111112, 'epoch': 0.16}                                                                                                                                                                             
{'loss': 0.8815, 'learning_rate': 8.888888888888889e-05, 'epoch': 0.19}                                                                                                                                                                             
{'loss': 1.0122, 'learning_rate': 6.666666666666667e-05, 'epoch': 0.22}                                                                                                                                                                             
{'loss': 0.8274, 'learning_rate': 4.4444444444444447e-05, 'epoch': 0.26}                                                                                                                                                                            
{'loss': 0.9267, 'learning_rate': 2.2222222222222223e-05, 'epoch': 0.29}                                                                                                                                                                            
{'loss': 0.8397, 'learning_rate': 0.0, 'epoch': 0.32}                                                                                                                                                                                               
{'train_runtime': 369.092, 'train_samples_per_second': 2.167, 'train_steps_per_second': 0.542, 'train_loss': 1.0403059482574464, 'epoch': 0.32}                                                                                                     
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [06:09<00:00,  1.85s/it]
TrainOutput(global_step=200, training_loss=1.0403059482574464, metrics={'train_runtime': 369.092, 'train_samples_per_second': 2.167, 'train_steps_per_second': 0.542, 'train_loss': 1.0403059482574464, 'epoch': 0.32})

@yangw1234 yangw1234 changed the title [WIP] Experiment XPU QLora Finetuning Experiment XPU QLora Finetuning Sep 12, 2023
@yangw1234 yangw1234 changed the title Experiment XPU QLora Finetuning [WIP] Experiment XPU QLora Finetuning Sep 14, 2023
@yangw1234 yangw1234 changed the title [WIP] Experiment XPU QLora Finetuning Experiment XPU QLora Finetuning Sep 14, 2023
if x_2d.shape[0] > 1 and x_2d.dtype == torch.float32:
# sometimes fp16 cause nan and training instability
# disable the conversion when training
if self.conver_to_half and x_2d.shape[0] > 1 and x_2d.dtype == torch.float32:
x_2d = x_2d.half()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is half fp16 only? we can try bf16 for training later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially user can still use fp16 or bf16 training by autocast even if self.convert_to_half is false. What we do here is to respect user's intent if he/she decides to use fp32.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

autocast often fails

return model


def prepare_model_for_kbit_training(model, use_gradient_checkpointing=True):
Copy link
Contributor

@jason-dai jason-dai Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to set model.is_loaded_in_4bit to true, so as to directly reuse the original prepare_model_for_kbit_training?

Copy link
Contributor Author

@yangw1234 yangw1234 Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can, but I think it might trigger some other hard-coded bitsandbytes related behavior in transformers (e.g. cannot call model.to(dtype)), so this will require more testing. I can explore this option in follow up PRs.

# you can install specific ipex/torch version for your need
pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
pip install git+https://github.com/huggingface/transformers.git@95fe0f5
pip install peft
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to specify the specific version of peft (and shall we add it to the dependency?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the perf version. I think we can add peft to dependency when we can get a stable release transformers and then we can add/update them together.

Copy link
Contributor

@jason-dai jason-dai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -0,0 +1,50 @@
# Q-Lora (experimental support)

This example demonstrate how to finetune a llama2-7b model use Big-LLM 4bit optimizations using [Intel GPUs](../README.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

demonstrate -> demonstrates

This example demonstrate how to finetune a llama2-7b model use Big-LLM 4bit optimizations using [Intel GPUs](../README.md).

## 0. Requirements
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these examples -> this example

@@ -21,4 +21,4 @@
AutoModelForSequenceClassification, AutoModelForMaskedLM, \
AutoModelForNextSentencePrediction, AutoModelForMultipleChoice, \
AutoModelForTokenClassification
from .modelling_bigdl import *
from .modelling_bigdl import *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this?

return result


@staticmethod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this decorator needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weirdly yes. I did not add it at first. It did not work.

@yangw1234 yangw1234 merged commit ac542ab into intel-analytics:main Sep 19, 2023
16 checks passed
@jason-dai
Copy link
Contributor

Is there an example of using the finetune model (save/load/inference)?

@yangw1234
Copy link
Contributor Author

Is there an example of using the finetune model (save/load/inference)?

Working on that.

Currently, user can use the same code to merge the trained adapter and export a merged checkpoint like in alpaca-lora. (from peft import PeftModel)

Using the adapter directly without merging needs our own PeftModel (from bigdl.llm.transformers.qlora import PeftModel. ) So there are some discrepancies and I am looking ways to resolve it.

liu-shaojun pushed a commit that referenced this pull request Mar 25, 2024
* Support xpu finetuning

* support xpu finetuning

* fix style

* fix style

* fix style

* refine example

* add readme

* refine readme

* refine api

* fix fp16

* fix example

* refactor

* fix style

* fix compute type

* add qlora

* refine training args

* fix example

* fix style

* fast path forinference

* address comments

* refine readme

* revert lint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants