Skip to content

model.save_pretrained() produced a corrupted adapter_model.bin (only 443 B) with alpaca-lora #286

Closed
@zetavg

Description

I recently found that when fine-tuning using alpaca-lora, model.save_pretrained() will save a adapter_model.bin that is only 443 B.

This seems to be happening after peft@75808eb2a6e7b4c3ed8aec003b6eeb30a2db1495.

Normally adapter_model.bin should be > 16 MB. And while the 443 B adapter_model.bin is loaded, the model behaves like not fine-tuned at all. In contrast, loading other checkpoints from the same training works as expected.

drwxrwxr-x 2 ubuntu ubuntu 4.0K Apr  9 12:55 .
drwxrwxr-x 7 ubuntu ubuntu 4.0K Apr  9 12:54 ..
-rw-rw-r-- 1 ubuntu ubuntu  350 Apr  9 12:55 adapter_config.json
-rw-rw-r-- 1 ubuntu ubuntu  443 Apr  9 12:55 adapter_model.bin
drwxr-xr-x 2 ubuntu ubuntu 4.0K Apr  9 12:06 checkpoint-400
drwxr-xr-x 2 ubuntu ubuntu 4.0K Apr  9 12:06 checkpoint-600
drwxr-xr-x 2 ubuntu ubuntu 4.0K Apr  9 12:07 checkpoint-800

I'm not sure if this is an issue to peft or not, or is this a duplication of other issues, but just leaving this for reference.

I've been testing with multiple versions of peft:

  • 072da6d9d62 works
  • 382b178911edff38c1ff619bbac2ba556bd2276b works
  • 75808eb2a6e7b4c3ed8aec003b6eeb30a2db1495 not working
  • 445940fb7b5d38390ffb6707e2a989e89fff03b5 not working
  • 1a6151b91fcdcc25326b9807d7dbf54e091d506c not working
  • 1117d4772109a098787ce7fc297cb6cd641de6eb not working

Steps to reproduce:

conda create python=3.8 -n test
conda activate test
git clone https://github.com/tloen/alpaca-lora.git
cd alpaca-lora
pip install -r requirements.txt

# to workaround AttributeError: bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cget_col_row_stats
cd /home/ubuntu/miniconda3/envs/test/lib/python3.8/site-packages/bitsandbytes/
mv libbitsandbytes_cpu.so libbitsandbytes_cpu.so.bak
cp libbitsandbytes_cuda121.so libbitsandbytes_cpu.so
cd -
conda install cudatoolkit

# alpaca_data_cleaned_first_100.json is alpaca_data_cleaned.json with only the first 100 items, setting --val_set_size 0 because there're not enough data to build the test set
python finetune.py --base_model 'decapoda-research/llama-7b-hf' --data_path '/data/datasets/alpaca_data_cleaned_first_100.json' --output_dir './lora-alpaca' --val_set_size 0
$ ls -alh lora-alpaca
total 16K
drwxrwxr-x 2 ubuntu ubuntu 4.0K Apr  9 12:55 .
drwxrwxr-x 7 ubuntu ubuntu 4.0K Apr  9 12:54 ..
-rw-rw-r-- 1 ubuntu ubuntu  350 Apr  9 12:55 adapter_config.json
-rw-rw-r-- 1 ubuntu ubuntu  443 Apr  9 12:55 adapter_model.bin

(adapter_model.bin should normally be around 16 MB)

Running on Lambda Cloud A10 instance.

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions