-
Notifications
You must be signed in to change notification settings - Fork 26.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Llama 3 - RuntimeError: shape '[-1, 0]' is invalid for input of size 41041920 #32170
Comments
What's weird is that we did not change llama code per say, but we did change |
Arf, could you try without this commit that I linked? |
Appart from this one, #32135 is the only potential culprit I see. I don't have access to the script so if any of you can isolate a small reproducer it would help a lot! |
I encountered the same issue. I tried running the script to fine-tune llama 3.0, 3.1, and mistral 7B 0.3 with transformers versions 4.43.0 and 4.43.1 but encountered the same error. However, version 4.42.4 works fine for all the base models. @ArthurZucker |
Ouch, maybe #31446 if you can revert it. |
@ArthurZucker I confirmed that In print(f"#### SELF.CONFIG.VOCAB_SIZE: {self.config.vocab_size}")
shift_logits = shift_logits.view(-1, self.config.vocab_size) Output:
|
from transformers import AutoTokenizer, AutoModelForCausalLM
import os
import torch
device = "cuda"
ckpt = "meta-llama/Meta-Llama-3.1-8B-Instruct"
model = AutoModelForCausalLM.from_pretrained(ckpt, attn_implementation="flash_attention_2", torch_dtype=torch.float16)
model.to(device)
tokenizer = AutoTokenizer.from_pretrained(ckpt)
prompt = ["Explain the thre body problem", "What is this?"]
inputs = tokenizer(prompt, return_tensors="pt", padding=True).to("cuda")
outputs = model(**inputs, labels = inputs["input_ids"])
print(outputs.loss)
outputs = model(inputs["input_ids"], labels = inputs["input_ids"]) I ran something like this which worked for me so I don't really know what's going one here 😓 |
@ArthurZucker I found that print(f"### Model Config 1: {model.config}")
# resize does its own gather
if len(tokenizer) > embedding_size:
# pad to multiple for tensor cores.
model.resize_token_embeddings(len(tokenizer), pad_to_multiple_of=8)
print(f"### Length of tokenizer: {len(tokenizer)}")
print(f"### Model Config 2: {model.config}")
exit(1) Output:
|
@ArthurZucker Reverting the commit (#31979) resolved the issue. |
Hey! Saw your comment under the linked PR. I just tried below from the current from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct")
vocab_size = model.vocab_size
model.resize_token_embeddings(vocab_size, pad_to_multiple_of=8)
assert model.vocab_size != 0
assert model.config.vocab_size == vocab_size
assert model.vocab_size == vocab_size |
System Info
transformers version 4.43.1, other package versions here: https://github.com/allenai/open-instruct/blob/main/requirements.txt
Who can help?
@ArthurZucker
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Running:
unset CUDA_LAUNCH_BLOCKING && accelerate launch --mixed_precision bf16 --num_machines 2 --num_processes 16 --machine_rank $BEAKER_REPLICA_RANK --main_process_ip $BEAKER_LEADER_REPLICA_HOSTNAME --main_process_port 29400 --use_deepspeed --deepspeed_config_file configs/ds_configs/stage3_no_offloading_accelerate.conf --deepspeed_multinode_launcher standard open_instruct/finetune.py --model_name_or_path meta-llama/Meta-Llama-3.1-8B --tokenizer_name meta-llama/Meta-Llama-3.1-8B --use_slow_tokenizer --dataset_name allenai/tulu-v2-sft-mixture --use_flash_attn --max_seq_length 4096 --preprocessing_num_workers 16 --per_device_train_batch_size 1 --gradient_accumulation_steps 8 --learning_rate 5e-6 --lr_scheduler_type linear --warmup_ratio 0.03 --weight_decay 0. --num_train_epochs 2 --output_dir /output/ --with_tracking --report_to tensorboard --logging_steps 1 --reduce_loss sum
using open-instructwe encounter this error on the first step of finetuning:
after updating to transformers 4.43.1 to support Llama 3.1 finetuning. Any idea what's going on? We're not sure if other packages need to be updated, if this is a known issue, or something else.
Expected behavior
Llama 3.1 finetuning to run successfully
The text was updated successfully, but these errors were encountered: