Facing issues when trying to fine-tune T5

### System Info

- `transformers` version: 4.35.2
- Platform: Linux-6.1.58+-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 0.19.4
- Safetensors version: 0.4.1
- Accelerate version: 0.25.0
- Accelerate config: 	not found
- PyTorch version (GPU?): 2.1.0+cu121 (True)
- Tensorflow version (GPU?): 2.15.0 (True)
- Flax version (CPU?/GPU?/TPU?): 0.7.5 (gpu)
- Jax version: 0.4.20
- JaxLib version: 0.4.20
- Using GPU in script?: T4
- Using distributed or parallel set-up in script?: No

### Who can help?

@ArthurZucker @youne

### Information

- [ ] The official example scripts
- [X] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [X] My own task or dataset (give details below)

### Reproduction

I am trying to fine tune a T5-base model but have been facing issues despite following the step by step guide found on the huggingface hub [here](https://huggingface.co/docs/transformers/tasks/translation).

So far this is my code:
`transformers.logging.set_verbosity_error()`

```python
from datasets import load_dataset

canard_train_augm = load_dataset("gaussalgo/Canard_Wiki-augmented", split="train")
canard_test_augm = load_dataset("gaussalgo/Canard_Wiki-augmented", split="test")

from transformers import AutoTokenizer

model_name = "t5-small"
tokenizer = AutoTokenizer.from_pretrained(model_name)

def preprocess_function(examples):
    combined_input = examples["Question"] + ": " + examples["true_contexts"]
    return tokenizer(combined_input, examples["Rewrite"],max_length=512, padding="max_length", truncation=True, return_tensors="pt")

tokenized_train = canard_train_augm.map(preprocess_function)
tokenized_test = canard_test_augm.map(preprocess_function)

from transformers import DataCollatorForSeq2Seq

data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model_name)

from transformers import DataCollatorForSeq2Seq

data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model_name)

import evaluate

metric = evaluate.load("sacrebleu")

import numpy as np


def postprocess_text(preds, labels):
    preds = [pred.strip() for pred in preds]
    labels = [[label.strip()] for label in labels]

    return preds, labels

def compute_metrics(eval_preds):
    preds, labels = eval_preds
    if isinstance(preds, tuple):
        preds = preds[0]
    decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)

    labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
    decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)

    decoded_preds, decoded_labels = postprocess_text(decoded_preds, decoded_labels)

    result = metric.compute(predictions=decoded_preds, references=decoded_labels)
    result = {"bleu": result["score"]}

    prediction_lens = [np.count_nonzero(pred != tokenizer.pad_token_id) for pred in preds]
    result["gen_len"] = np.mean(prediction_lens)
    result = {k: round(v, 4) for k, v in result.items()}
    return result

from transformers import AutoModelForSeq2SeqLM, Seq2SeqTrainingArguments, Seq2SeqTrainer

model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

training_args = Seq2SeqTrainingArguments(
    output_dir="wtf",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    weight_decay=0.01,
    save_total_limit=3,
    num_train_epochs=2,
    predict_with_generate=True,
    fp16=True,
)

trainer = Seq2SeqTrainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_test,
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

trainer.train()
```

I tried several examples including my own Customized Class for the trainer function but always ended with the same issue even when I tried the same code found in the step-by-step guide provided by huggingface.

The error happens when calling the `trainer.train()` returning the following:
`ValueError: too many values to unpack (expected 2)`

I followed the exact same format as the documentation and I believe it is something that is happening when calling the loss function but was just unable to put my finger to it, if anyone can help that would be great.

### Expected behavior

Expected behavior is trying being able to fine-tune the T5 model with the above dataset by eliminating or identifying the cause of the error. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Facing issues when trying to fine-tune T5 #28111

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Facing issues when trying to fine-tune T5 #28111

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions