-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experiment XPU QLora Finetuning #8937
Conversation
806c87c
to
c8a48df
Compare
a0dfb57
to
00c4e8c
Compare
if x_2d.shape[0] > 1 and x_2d.dtype == torch.float32: | ||
# sometimes fp16 cause nan and training instability | ||
# disable the conversion when training | ||
if self.conver_to_half and x_2d.shape[0] > 1 and x_2d.dtype == torch.float32: | ||
x_2d = x_2d.half() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is half
fp16 only? we can try bf16 for training later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially user can still use fp16 or bf16 training by autocast
even if self.convert_to_half
is false. What we do here is to respect user's intent if he/she decides to use fp32.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
autocast
often fails
return model | ||
|
||
|
||
def prepare_model_for_kbit_training(model, use_gradient_checkpointing=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to set model.is_loaded_in_4bit
to true, so as to directly reuse the original prepare_model_for_kbit_training
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we can, but I think it might trigger some other hard-coded bitsandbytes
related behavior in transformers
(e.g. cannot call model.to(dtype)
), so this will require more testing. I can explore this option in follow up PRs.
# you can install specific ipex/torch version for your need | ||
pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu | ||
pip install git+https://github.com/huggingface/transformers.git@95fe0f5 | ||
pip install peft |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to specify the specific version of peft
(and shall we add it to the dependency?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the perf
version. I think we can add peft
to dependency when we can get a stable release transformers
and then we can add/update them together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -0,0 +1,50 @@ | |||
# Q-Lora (experimental support) | |||
|
|||
This example demonstrate how to finetune a llama2-7b model use Big-LLM 4bit optimizations using [Intel GPUs](../README.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
demonstrate -> demonstrates
This example demonstrate how to finetune a llama2-7b model use Big-LLM 4bit optimizations using [Intel GPUs](../README.md). | ||
|
||
## 0. Requirements | ||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these examples -> this example
@@ -21,4 +21,4 @@ | |||
AutoModelForSequenceClassification, AutoModelForMaskedLM, \ | |||
AutoModelForNextSentencePrediction, AutoModelForMultipleChoice, \ | |||
AutoModelForTokenClassification | |||
from .modelling_bigdl import * | |||
from .modelling_bigdl import * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert this?
return result | ||
|
||
|
||
@staticmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this decorator needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weirdly yes. I did not add it at first. It did not work.
Is there an example of using the finetune model (save/load/inference)? |
Working on that. Currently, user can use the same code to merge the trained adapter and export a merged checkpoint like in alpaca-lora. ( Using the adapter directly without merging needs our own PeftModel ( |
* Support xpu finetuning * support xpu finetuning * fix style * fix style * fix style * refine example * add readme * refine readme * refine api * fix fp16 * fix example * refactor * fix style * fix compute type * add qlora * refine training args * fix example * fix style * fast path forinference * address comments * refine readme * revert lint
Description
Example referenced https://colab.research.google.com/drive/1VoYNfYDKcKRQRor98Zbf2-9VQTtGJ24k?usp=sharing
Current output on llama2-13b (bs=4, gradient_acc_steps=1, warmup_steps=20, max_steps=200)