Closed
Description
I recently found that when fine-tuning using alpaca-lora, model.save_pretrained()
will save a adapter_model.bin
that is only 443 B.
This seems to be happening after peft@75808eb2a6e7b4c3ed8aec003b6eeb30a2db1495
.
Normally adapter_model.bin
should be > 16 MB. And while the 443 B adapter_model.bin
is loaded, the model behaves like not fine-tuned at all. In contrast, loading other checkpoints from the same training works as expected.
drwxrwxr-x 2 ubuntu ubuntu 4.0K Apr 9 12:55 .
drwxrwxr-x 7 ubuntu ubuntu 4.0K Apr 9 12:54 ..
-rw-rw-r-- 1 ubuntu ubuntu 350 Apr 9 12:55 adapter_config.json
-rw-rw-r-- 1 ubuntu ubuntu 443 Apr 9 12:55 adapter_model.bin
drwxr-xr-x 2 ubuntu ubuntu 4.0K Apr 9 12:06 checkpoint-400
drwxr-xr-x 2 ubuntu ubuntu 4.0K Apr 9 12:06 checkpoint-600
drwxr-xr-x 2 ubuntu ubuntu 4.0K Apr 9 12:07 checkpoint-800
I'm not sure if this is an issue to peft
or not, or is this a duplication of other issues, but just leaving this for reference.
I've been testing with multiple versions of peft
:
072da6d9d62
works382b178911edff38c1ff619bbac2ba556bd2276b
works75808eb2a6e7b4c3ed8aec003b6eeb30a2db1495
not working445940fb7b5d38390ffb6707e2a989e89fff03b5
not working1a6151b91fcdcc25326b9807d7dbf54e091d506c
not working1117d4772109a098787ce7fc297cb6cd641de6eb
not working
Steps to reproduce:
conda create python=3.8 -n test
conda activate test
git clone https://github.com/tloen/alpaca-lora.git
cd alpaca-lora
pip install -r requirements.txt
# to workaround AttributeError: bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cget_col_row_stats
cd /home/ubuntu/miniconda3/envs/test/lib/python3.8/site-packages/bitsandbytes/
mv libbitsandbytes_cpu.so libbitsandbytes_cpu.so.bak
cp libbitsandbytes_cuda121.so libbitsandbytes_cpu.so
cd -
conda install cudatoolkit
# alpaca_data_cleaned_first_100.json is alpaca_data_cleaned.json with only the first 100 items, setting --val_set_size 0 because there're not enough data to build the test set
python finetune.py --base_model 'decapoda-research/llama-7b-hf' --data_path '/data/datasets/alpaca_data_cleaned_first_100.json' --output_dir './lora-alpaca' --val_set_size 0
$ ls -alh lora-alpaca
total 16K
drwxrwxr-x 2 ubuntu ubuntu 4.0K Apr 9 12:55 .
drwxrwxr-x 7 ubuntu ubuntu 4.0K Apr 9 12:54 ..
-rw-rw-r-- 1 ubuntu ubuntu 350 Apr 9 12:55 adapter_config.json
-rw-rw-r-- 1 ubuntu ubuntu 443 Apr 9 12:55 adapter_model.bin
(adapter_model.bin
should normally be around 16 MB)
Running on Lambda Cloud A10 instance.
Metadata
Assignees
Labels
No labels