Skip to content

(LLama-2-13b-hf) - Qlora - SFT - RuntimeError: mat1 and mat2 shapes cannot be multiplied (3264x5120 and 1x2560) #202

@aldrinc

Description

@aldrinc
CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
    --stage sft \
    --model_name_or_path meta-llama/Llama-2-13b-hf \
    --do_train \
    --dataset oaast_sft \
    --finetuning_type lora \
    --quantization_bit 4 \
    --output_dir /workspace/llama-2-output \
    --overwrite_cache \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 100 \
    --learning_rate 2e-5 \
    --num_train_epochs 0.5 \
    --plot_loss \
    --fp16

Server startup script

pip install --upgrade huggingface_hub
huggingface-cli login --token $HF_TOKEN
git clone https://github.com/hiyouga/LLaMA-Efficient-Tuning.git
cd LLaMA-Efficient-Tuning
pip install -r requirements.txt
pip install bitsandbytes>=0.39.0
pip install scipy
pip install -U git+https://github.com/huggingface/peft.git

Older closed issue suggests upgrading PEFT (peft-0.5.0.dev0) solves but I continue to receive same error.

This only happens with Llama-2-13b-hf and I am able to successfully SFT Vicuna and other models.

Metadata

Metadata

Assignees

No one assigned

    Labels

    solvedThis problem has been already solved

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions