model.save_pretrained() produced a corrupted adapter_model.bin (only 443 B) with alpaca-lora

I recently found that when fine-tuning using [alpaca-lora](https://github.com/tloen/alpaca-lora), [`model.save_pretrained()`](https://github.com/tloen/alpaca-lora/blob/8d58d37b65501eb07a1397a10ca5e80834b626f1/finetune.py#L268) will save a `adapter_model.bin` that is only 443 B.

This seems to be happening after peft@`75808eb2a6e7b4c3ed8aec003b6eeb30a2db1495`.

Normally `adapter_model.bin` should be > 16 MB. And while the 443 B `adapter_model.bin` is loaded, the model behaves like not fine-tuned at all. In contrast, loading other checkpoints from the same training works as expected.

```
drwxrwxr-x 2 ubuntu ubuntu 4.0K Apr  9 12:55 .
drwxrwxr-x 7 ubuntu ubuntu 4.0K Apr  9 12:54 ..
-rw-rw-r-- 1 ubuntu ubuntu  350 Apr  9 12:55 adapter_config.json
-rw-rw-r-- 1 ubuntu ubuntu  443 Apr  9 12:55 adapter_model.bin
drwxr-xr-x 2 ubuntu ubuntu 4.0K Apr  9 12:06 checkpoint-400
drwxr-xr-x 2 ubuntu ubuntu 4.0K Apr  9 12:06 checkpoint-600
drwxr-xr-x 2 ubuntu ubuntu 4.0K Apr  9 12:07 checkpoint-800
```

I'm not sure if this is an issue to `peft` or not, or is this a duplication of other issues, but just leaving this for reference.

I've been testing with multiple versions of `peft`:

* `072da6d9d62` works
* `382b178911edff38c1ff619bbac2ba556bd2276b` works
* `75808eb2a6e7b4c3ed8aec003b6eeb30a2db1495` not working
* `445940fb7b5d38390ffb6707e2a989e89fff03b5` not working
* `1a6151b91fcdcc25326b9807d7dbf54e091d506c` not working
* `1117d4772109a098787ce7fc297cb6cd641de6eb` not working

Steps to reproduce:

```bash
conda create python=3.8 -n test
conda activate test
git clone https://github.com/tloen/alpaca-lora.git
cd alpaca-lora
pip install -r requirements.txt

# to workaround AttributeError: bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cget_col_row_stats
cd /home/ubuntu/miniconda3/envs/test/lib/python3.8/site-packages/bitsandbytes/
mv libbitsandbytes_cpu.so libbitsandbytes_cpu.so.bak
cp libbitsandbytes_cuda121.so libbitsandbytes_cpu.so
cd -
conda install cudatoolkit

# alpaca_data_cleaned_first_100.json is alpaca_data_cleaned.json with only the first 100 items, setting --val_set_size 0 because there're not enough data to build the test set
python finetune.py --base_model 'decapoda-research/llama-7b-hf' --data_path '/data/datasets/alpaca_data_cleaned_first_100.json' --output_dir './lora-alpaca' --val_set_size 0
```

```bash
$ ls -alh lora-alpaca
total 16K
drwxrwxr-x 2 ubuntu ubuntu 4.0K Apr  9 12:55 .
drwxrwxr-x 7 ubuntu ubuntu 4.0K Apr  9 12:54 ..
-rw-rw-r-- 1 ubuntu ubuntu  350 Apr  9 12:55 adapter_config.json
-rw-rw-r-- 1 ubuntu ubuntu  443 Apr  9 12:55 adapter_model.bin
```

(`adapter_model.bin` should normally be around 16 MB)

Running on Lambda Cloud A10 instance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model.save_pretrained() produced a corrupted adapter_model.bin (only 443 B) with alpaca-lora #286

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development