Finetune ALL LLMs with ALL Adapeters on ALL Platforms!
Model | LoRA | QLoRA | AdaLoRA | Prefix Tuning | P-Tuning | Prompt Tuning |
---|---|---|---|---|---|---|
Bloom | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
LLaMA | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
LLaMA2 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
LLaMA3/3.1/3.2 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
ChatGLM | ✅ | ✅ | ✅ | ☑️ | ☑️ | ☑️ |
ChatGLM2 | ✅ | ✅ | ✅ | ☑️ | ☑️ | ☑️ |
Qwen | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Baichuan | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Mixtral | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Phi | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Phi3 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Gemma | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
You can Finetune LLM on
- Windows
- Linux
- Mac M1/2
You can Handle train / test Data with
- Terminal
- File
- DataBase
You can Do various Task
- CausalLM (default)
- SequenceClassification
P.S. Unfortunately, SuperAdapters do not support qlora on Mac, please use lora/adalora instead.
CentOS:
yum install -y xz-devel
Ubuntu:
apt-get install -y liblzma-dev
MacOS:
brew install xz
P.S. Maybe you should recompile the python with xz
CPPFLAGS="-I$(brew --prefix xz)/include" pyenv install 3.10.0
If you want to use gpu on Mac, Please read How to use GPU on Mac
P.S. Please Make sure your MacOS version > 14.0 !
pip uninstall torch torchvision torchaudio
pip install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu
pip install -r requirements.txt
python finetune.py --model_type chatglm --data "data/train/" --model_path "LLMs/chatglm/chatglm-6b/" --adapter "lora" --output_dir "output/chatglm"
python inference.py --model_type chatglm --instruction "Who are you?" --model_path "LLMs/chatglm/chatglm-6b/" --adapter_weights "output/chatglm" --max_new_tokens 32
python finetune.py --model_type llama --data "data/train/" --model_path "LLMs/open-llama/open-llama-3b/" --adapter "lora" --output_dir "output/llama"
python inference.py --model_type llama --instruction "Who are you?" --model_path "LLMs/open-llama/open-llama-3b" --adapter_weights "output/llama" --max_new_tokens 32
python finetune.py --model_type qwen --data "data/train/" --model_path "LLMs/Qwen/Qwen-7b-chat" --adapter "lora" --output_dir "output/Qwen"
python inference.py --model_type qwen --instruction "Who are you?" --model_path "LLMs/Qwen/Qwen-7b-chat" --adapter_weights "output/Qwen" --max_new_tokens 32
Other LLMs are some usage of the above.
You need to specify task_type('classify') and labels
python finetune.py --model_type llama --data "data/train/alpaca_tiny_classify.json" --model_path "LLMs/open-llama/open-llama-3b" --adapter "lora" --output_dir "output/llama" --task_type classify --labels '["0", "1"]' --disable_wandb
python inference.py --model_type llama --data "data/train/alpaca_tiny_classify.json" --model_path "LLMs/open-llama/open-llama-3b" --adapter_weights "output/llama" --task_type classify --labels '["0", "1"]' --disable_wandb
- You need to install a MySQL, and put the db config into the system env.
Eg.
export LLM_DB_HOST='127.0.0.1'
export LLM_DB_PORT=3306
export LLM_DB_USERNAME='YOURUSERNAME'
export LLM_DB_PASSWORD='YOURPASSWORD'
export LLM_DB_NAME='YOURDBNAME'
- create the necessary tables
source xxxx.sql
- db_iteration: [train/test] The record's set name.
- db_type: [test] The record is whether "train" or "test".
- db_test_iteration: [test] The record's test set name.
- finetune (use chatglm for example)
python finetune.py --model_type chatglm --fromdb --db_iteration xxxxxx --model_path "LLMs/chatglm/chatglm-6b/" --adapter "lora" --output_dir "output/chatglm" --disable_wandb
- eval
python inference.py --model_type chatglm --fromdb --db_iteration xxxxxx --db_type 'test' --db_test_iteration yyyyyyy --model_path "LLMs/chatglm/chatglm-6b/" --adapter_weights "output/chatglm" --max_new_tokens 6
usage: finetune.py [-h] [--data DATA] [--model_type {llama,llama2,llama3,chatglm,chatglm2,bloom,qwen,baichuan,mixtral,phi,gemma}] [--task_type {seq2seq,classify}] [--labels LABELS] [--model_path MODEL_PATH]
[--output_dir OUTPUT_DIR] [--disable_wandb] [--adapter {lora,qlora,adalora,prompt,p_tuning,prefix}] [--lora_r LORA_R] [--lora_alpha LORA_ALPHA] [--lora_dropout LORA_DROPOUT]
[--lora_target_modules LORA_TARGET_MODULES [LORA_TARGET_MODULES ...]] [--adalora_init_r ADALORA_INIT_R] [--adalora_tinit ADALORA_TINIT] [--adalora_tfinal ADALORA_TFINAL]
[--adalora_delta_t ADALORA_DELTA_T] [--num_virtual_tokens NUM_VIRTUAL_TOKENS] [--mapping_hidden_dim MAPPING_HIDDEN_DIM] [--epochs EPOCHS] [--learning_rate LEARNING_RATE]
[--cutoff_len CUTOFF_LEN] [--val_set_size VAL_SET_SIZE] [--group_by_length] [--logging_steps LOGGING_STEPS] [--load_8bit] [--add_eos_token]
[--resume_from_checkpoint [RESUME_FROM_CHECKPOINT]] [--per_gpu_train_batch_size PER_GPU_TRAIN_BATCH_SIZE] [--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS] [--fromdb]
[--db_iteration DB_ITERATION]
Finetune for all.
optional arguments:
-h, --help show this help message and exit
--data DATA the data used for instructing tuning
--model_type {llama,llama2,llama3,chatglm,chatglm2,bloom,qwen,baichuan,mixtral,phi,gemma}
--task_type {seq2seq,classify}
--labels LABELS Labels to classify, only used when task_type is classify
--model_path MODEL_PATH
--output_dir OUTPUT_DIR
The DIR to save the model
--disable_wandb Disable report to wandb
--adapter {lora,qlora,adalora,prompt,p_tuning,prefix}
--lora_r LORA_R
--lora_alpha LORA_ALPHA
--lora_dropout LORA_DROPOUT
--lora_target_modules LORA_TARGET_MODULES [LORA_TARGET_MODULES ...]
the module to be injected, e.g. q_proj/v_proj/k_proj/o_proj for llama, query_key_value for bloom&GLM
--adalora_init_r ADALORA_INIT_R
--adalora_tinit ADALORA_TINIT
number of warmup steps for AdaLoRA wherein no pruning is performed
--adalora_tfinal ADALORA_TFINAL
fix the resulting budget distribution and fine-tune the model for tfinal steps when using AdaLoRA
--adalora_delta_t ADALORA_DELTA_T
interval of steps for AdaLoRA to update rank
--num_virtual_tokens NUM_VIRTUAL_TOKENS
--mapping_hidden_dim MAPPING_HIDDEN_DIM
--epochs EPOCHS
--learning_rate LEARNING_RATE
--cutoff_len CUTOFF_LEN
--val_set_size VAL_SET_SIZE
--group_by_length
--logging_steps LOGGING_STEPS
--load_8bit
--add_eos_token
--resume_from_checkpoint [RESUME_FROM_CHECKPOINT]
resume from the specified or the latest checkpoint, e.g. `--resume_from_checkpoint [path]` or `--resume_from_checkpoint`
--per_gpu_train_batch_size PER_GPU_TRAIN_BATCH_SIZE
Batch size per GPU/CPU for training.
--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS
--fromdb
--db_iteration DB_ITERATION
The record's set name.
--db_item_num DB_ITEM_NUM
The Limit Num of train/test items selected from DB.
usage: inference.py [-h] [--debug] [--web] [--api] [--instruction INSTRUCTION] [--input INPUT] [--max_input MAX_INPUT] [--test_data_path TEST_DATA_PATH]
[--model_type {llama,llama2,llama3,chatglm,chatglm2,bloom,qwen,baichuan,mixtral,phi,phi3,gemma}] [--task_type {seq2seq,classify}] [--labels LABELS] [--model_path MODEL_PATH]
[--adapter_weights ADAPTER_WEIGHTS] [--load_8bit] [--temperature TEMPERATURE] [--top_p TOP_P] [--top_k TOP_K] [--max_new_tokens MAX_NEW_TOKENS] [--vllm] [--fromdb] [--db_type DB_TYPE]
[--db_iteration DB_ITERATION] [--db_test_iteration DB_TEST_ITERATION] [--db_item_num DB_ITEM_NUM]
Inference for all.
optional arguments:
-h, --help show this help message and exit
--debug Debug Mode to output detail info
--web Web Demo to try the inference
--api API to try the inference
--instruction INSTRUCTION
--input INPUT
--max_input MAX_INPUT
Limit the input length to avoid OOM or other bugs
--test_data_path TEST_DATA_PATH
The DIR of test data
--model_type {llama,llama2,llama3,chatglm,chatglm2,bloom,qwen,baichuan,mixtral,phi,phi3,gemma}
--task_type {seq2seq,classify}
--labels LABELS Labels to classify, only used when task_type is classify
--model_path MODEL_PATH
--adapter_weights ADAPTER_WEIGHTS
The DIR of adapter weights
--load_8bit
--temperature TEMPERATURE
temperature higher, LLM is more creative
--top_p TOP_P
--top_k TOP_K
--max_new_tokens MAX_NEW_TOKENS
--vllm Use vllm to accelerate inference.
--fromdb
--db_type DB_TYPE The record is whether 'train' or 'test'.
--db_iteration DB_ITERATION
The record's set name.
--db_test_iteration DB_TEST_ITERATION
The record's test set name.
--db_item_num DB_ITEM_NUM
The Limit Num of train/test items selected from DB.
Use vllm:
- Combine the Base Model and Adapter weight
python tool.py combine --model_type llama3 --model_path "LLMs/llama3.1/" --adapter_weights "output/llama3.1/" --output_dir "output/llama3.1-combined/"
- Install the dependencies and start vllm server, Help Link.
- use option vllm
python inference.py --model_type llama3 --instruction "Who are you?" --model_path "/root/SuperAdapters/output/llama3.1-combined" --vllm --max_new_tokens 32
usage: tool.py combine [-h] [--model_type {llama,llama2,llama3,chatglm,chatglm2,bloom,qwen,baichuan,mixtral,phi,phi3,gemma}] [--model_path MODEL_PATH] [--adapter_weights ADAPTER_WEIGHTS]
[--output_dir OUTPUT_DIR] [--max_shard_size MAX_SHARD_SIZE]
optional arguments:
-h, --help show this help message and exit
--model_type {llama,llama2,llama3,chatglm,chatglm2,bloom,qwen,baichuan,mixtral,phi,phi3,gemma}
--model_path MODEL_PATH
--adapter_weights ADAPTER_WEIGHTS
The DIR of adapter weights
--output_dir OUTPUT_DIR
The DIR to save the model
--max_shard_size MAX_SHARD_SIZE
Max size of each of the combined model weight, like 1GB,5GB,etc.
python tool.py combine --model_type llama --model_path "LLMs/open-llama/open-llama-3b/" --adapter_weights "output/llama/" --output_dir "output/combine/"
Add the "--web" parameter
python inference.py --model_type phi --model_path "LLMs/phi/phi-2" --web
Add the "--api" parameter
python inference.py --model_type phi --model_path "LLMs/phi/phi-2" --api
python web/label.py
python web/label.py --type chat