-
Notifications
You must be signed in to change notification settings - Fork 31.3k
Description
System Info
transformersversion: 4.35.2- Platform: Linux-6.1.58+-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 0.19.4
- Safetensors version: 0.4.1
- Accelerate version: 0.25.0
- Accelerate config: not found
- PyTorch version (GPU?): 2.1.0+cu121 (True)
- Tensorflow version (GPU?): 2.15.0 (True)
- Flax version (CPU?/GPU?/TPU?): 0.7.5 (gpu)
- Jax version: 0.4.20
- JaxLib version: 0.4.20
- Using GPU in script?: T4
- Using distributed or parallel set-up in script?: No
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
I am trying to fine tune a T5-base model but have been facing issues despite following the step by step guide found on the huggingface hub here.
So far this is my code:
transformers.logging.set_verbosity_error()
from datasets import load_dataset
canard_train_augm = load_dataset("gaussalgo/Canard_Wiki-augmented", split="train")
canard_test_augm = load_dataset("gaussalgo/Canard_Wiki-augmented", split="test")
from transformers import AutoTokenizer
model_name = "t5-small"
tokenizer = AutoTokenizer.from_pretrained(model_name)
def preprocess_function(examples):
combined_input = examples["Question"] + ": " + examples["true_contexts"]
return tokenizer(combined_input, examples["Rewrite"],max_length=512, padding="max_length", truncation=True, return_tensors="pt")
tokenized_train = canard_train_augm.map(preprocess_function)
tokenized_test = canard_test_augm.map(preprocess_function)
from transformers import DataCollatorForSeq2Seq
data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model_name)
from transformers import DataCollatorForSeq2Seq
data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model_name)
import evaluate
metric = evaluate.load("sacrebleu")
import numpy as np
def postprocess_text(preds, labels):
preds = [pred.strip() for pred in preds]
labels = [[label.strip()] for label in labels]
return preds, labels
def compute_metrics(eval_preds):
preds, labels = eval_preds
if isinstance(preds, tuple):
preds = preds[0]
decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)
labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
decoded_preds, decoded_labels = postprocess_text(decoded_preds, decoded_labels)
result = metric.compute(predictions=decoded_preds, references=decoded_labels)
result = {"bleu": result["score"]}
prediction_lens = [np.count_nonzero(pred != tokenizer.pad_token_id) for pred in preds]
result["gen_len"] = np.mean(prediction_lens)
result = {k: round(v, 4) for k, v in result.items()}
return result
from transformers import AutoModelForSeq2SeqLM, Seq2SeqTrainingArguments, Seq2SeqTrainer
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
training_args = Seq2SeqTrainingArguments(
output_dir="wtf",
evaluation_strategy="epoch",
learning_rate=2e-5,
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
weight_decay=0.01,
save_total_limit=3,
num_train_epochs=2,
predict_with_generate=True,
fp16=True,
)
trainer = Seq2SeqTrainer(
model=model,
args=training_args,
train_dataset=tokenized_train,
eval_dataset=tokenized_test,
tokenizer=tokenizer,
data_collator=data_collator,
compute_metrics=compute_metrics,
)
trainer.train()I tried several examples including my own Customized Class for the trainer function but always ended with the same issue even when I tried the same code found in the step-by-step guide provided by huggingface.
The error happens when calling the trainer.train() returning the following:
ValueError: too many values to unpack (expected 2)
I followed the exact same format as the documentation and I believe it is something that is happening when calling the loss function but was just unable to put my finger to it, if anyone can help that would be great.
Expected behavior
Expected behavior is trying being able to fine-tune the T5 model with the above dataset by eliminating or identifying the cause of the error.