-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Experiment XPU QLora Finetuning (#8937)
* Support xpu finetuning * support xpu finetuning * fix style * fix style * fix style * refine example * add readme * refine readme * refine api * fix fp16 * fix example * refactor * fix style * fix compute type * add qlora * refine training args * fix example * fix style * fast path forinference * address comments * refine readme * revert lint
- Loading branch information
Showing
6 changed files
with
373 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
# Q-Lora (experimental support) | ||
|
||
This example demonstrates how to finetune a llama2-7b model use Big-LLM 4bit optimizations using [Intel GPUs](../README.md). | ||
|
||
## 0. Requirements | ||
To run this example with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information. | ||
|
||
## Example: Finetune llama2-7b using qlora | ||
|
||
This example is ported from [bnb-4bit-training](https://colab.research.google.com/drive/1VoYNfYDKcKRQRor98Zbf2-9VQTtGJ24k?usp=sharing) | ||
|
||
### 1. Install | ||
|
||
```bash | ||
conda create -n llm python=3.9 | ||
conda activate llm | ||
# below command will install intel_extension_for_pytorch==2.0.110+xpu as default | ||
# you can install specific ipex/torch version for your need | ||
pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu | ||
pip install git+https://github.com/huggingface/transformers.git@95fe0f5 | ||
pip install peft==0.5.0 | ||
``` | ||
|
||
### 2. Configures OneAPI environment variables | ||
```bash | ||
source /opt/intel/oneapi/setvars.sh | ||
``` | ||
|
||
### 3. Run | ||
|
||
``` | ||
python ./qlora_finetuning.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH | ||
``` | ||
|
||
### Sample Output | ||
```log | ||
{'loss': 1.6134, 'learning_rate': 0.0002, 'epoch': 0.03} | ||
{'loss': 1.3038, 'learning_rate': 0.00017777777777777779, 'epoch': 0.06} | ||
{'loss': 1.2634, 'learning_rate': 0.00015555555555555556, 'epoch': 0.1} | ||
{'loss': 1.2389, 'learning_rate': 0.00013333333333333334, 'epoch': 0.13} | ||
{'loss': 1.0399, 'learning_rate': 0.00011111111111111112, 'epoch': 0.16} | ||
{'loss': 1.0406, 'learning_rate': 8.888888888888889e-05, 'epoch': 0.19} | ||
{'loss': 1.3114, 'learning_rate': 6.666666666666667e-05, 'epoch': 0.22} | ||
{'loss': 0.9876, 'learning_rate': 4.4444444444444447e-05, 'epoch': 0.26} | ||
{'loss': 1.1406, 'learning_rate': 2.2222222222222223e-05, 'epoch': 0.29} | ||
{'loss': 1.1728, 'learning_rate': 0.0, 'epoch': 0.32} | ||
{'train_runtime': 225.8005, 'train_samples_per_second': 3.543, 'train_steps_per_second': 0.886, 'train_loss': 1.211241865158081, 'epoch': 0.32} | ||
100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [03:45<00:00, 1.13s/it] | ||
TrainOutput(global_step=200, training_loss=1.211241865158081, metrics={'train_runtime': 225.8005, 'train_samples_per_second': 3.543, 'train_steps_per_second': 0.886, 'train_loss': 1.211241865158081, 'epoch': 0.32}) | ||
``` |
84 changes: 84 additions & 0 deletions
84
python/llm/example/gpu/qlora_finetuning/qlora_finetuning.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
# | ||
# Copyright 2016 The BigDL Authors. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# | ||
|
||
import torch | ||
import os | ||
os.environ["ACCELERATE_USE_IPEX"] = "true" | ||
os.environ["ACCELERATE_USE_XPU"] = "true" | ||
|
||
import transformers | ||
from transformers import LlamaTokenizer | ||
|
||
from peft import LoraConfig | ||
import intel_extension_for_pytorch as ipex | ||
from bigdl.llm.transformers.qlora import get_peft_model, prepare_model_for_kbit_training | ||
from bigdl.llm.transformers import AutoModelForCausalLM | ||
from datasets import load_dataset | ||
import argparse | ||
|
||
if __name__ == "__main__": | ||
|
||
parser = argparse.ArgumentParser(description='Predict Tokens using `generate()` API for Llama2 model') | ||
parser.add_argument('--repo-id-or-model-path', type=str, default="meta-llama/Llama-2-7b-hf", | ||
help='The huggingface repo id for the Llama2 (e.g. `meta-llama/Llama-2-7b-hf` and `meta-llama/Llama-2-13b-chat-hf`) to be downloaded' | ||
', or the path to the huggingface checkpoint folder') | ||
parser.add_argument('--dataset', type=str, default="Abirate/english_quotes") | ||
|
||
args = parser.parse_args() | ||
model_path = args.repo_id_or_model_path | ||
dataset_path = args.dataset | ||
tokenizer = LlamaTokenizer.from_pretrained(model_path, trust_remote_code=True) | ||
|
||
data = load_dataset(dataset_path) | ||
data = data.map(lambda samples: tokenizer(samples["quote"]), batched=True) | ||
model = AutoModelForCausalLM.from_pretrained(model_path, | ||
load_in_4bit=True, | ||
optimize_model=False, | ||
modules_to_not_convert=["lm_head"],) | ||
model = model.to('xpu') | ||
model.gradient_checkpointing_enable() | ||
model = prepare_model_for_kbit_training(model) | ||
config = LoraConfig( | ||
r=8, | ||
lora_alpha=32, | ||
target_modules=["q_proj", "k_proj", "v_proj"], | ||
lora_dropout=0.05, | ||
bias="none", | ||
task_type="CAUSAL_LM" | ||
) | ||
model = get_peft_model(model, config) | ||
tokenizer.pad_token_id = 0 | ||
tokenizer.padding_side = "left" | ||
trainer = transformers.Trainer( | ||
model=model, | ||
train_dataset=data["train"], | ||
args=transformers.TrainingArguments( | ||
per_device_train_batch_size=4, | ||
gradient_accumulation_steps= 1, | ||
warmup_steps=20, | ||
max_steps=200, | ||
learning_rate=2e-4, | ||
fp16=False, # fp16 is not supported yet | ||
logging_steps=20, | ||
output_dir="outputs", | ||
optim="adamw_hf", # paged_adamw_8bit is not supported yet | ||
# gradient_checkpointing=True, # can further reduce memory but slower | ||
), | ||
data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False), | ||
) | ||
model.config.use_cache = False # silence the warnings. Please re-enable for inference! | ||
result = trainer.train() | ||
print(result) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.