T5 finetune outputting gibberish

## Environment info
- `transformers` version: 3.3.1
- Platform: Linux-4.4.0-116-generic-x86_64-with-glibc2.10
- Python version: 3.8.5
- PyTorch version (GPU?): 1.6.0 (True)
- Tensorflow version (GPU?): not installed (NA)
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: (tried with both 1 and 2 gpus)

### Who can help
 Summarization: @sshleifer
 T5: @patrickvonplaten
 examples/seq2seq: @sshleifer

## Information
I am trying to finetune on a custom dataset. I posted about my specific use case here in the forums: https://discuss.huggingface.co/t/t5-tips-for-finetuning-on-crossword-clues-clue-answer/1514

The problem arises when using:
* [X] the official example scripts: (give details below)
* [ ] my own modified scripts: (give details below)

The tasks I am working on is:
* [ ] an official GLUE/SQUaD task: (give the name)
* [X ] my own task or dataset: (give details below)

## To reproduce
* clone transformers from master
* pip install -e . ; pip install -r requirements.txt
* cd exampls/seq2seq
* modify finetune_t5.sh script to run with a local data set (data_set/[val|test|train].[source|target])

(Note that I have changed nothing else)

`python finetune.py \
--model_name_or_path=t5-small \
--tokenizer_name=t5-small \
--data_dir=${HOME}/data_set \
--learning_rate=3e-4 \
--output_dir=$OUTPUT_DIR \
--max_source_length=100 \
--max_target_length=100 \
--num_train_epochs=300  \
--train_batch_size=64 \
--eval_batch_size=64 \
--gpus=1 \
--auto_select_gpus=True \
--save_top_k=3 \
--output_dir=$OUTPUT_DIR \
--do_train \
--do_predict \
 "$@"
`

As a baseline "does the T5 work", my input outputs are of the form (one per line)
(this is one line in train.source): This is a sentence
(this is corresponding line in train.target): This

The lines are exactly as above, with a new line after each example, but with no other punctuation. I have not modified tokens or the model.

## Expected behavior

Expect T5 to learn to output the first word.

## Observed
T5 outputs first word followed by gibberish:

After 300 epochs, here is what we see for the first 5 lines of source vs test_generation (test.target is just the first word of each line in test.source)
Test.source:
We raised a bloom, a monster
I let Satan corrupt and torment
Chapter in play is an old piece
Old skin disease liable to drain confidence
Keep a riot going inside a musical academy

test_generations:
We vsahmoastuosastostassymbossa
Issahrastahmoormentostormentastoshomment
Chapter vshygie'ny-futtahraffahtaftast
Old hygienohmahrastassahuasairtia
Keep'astifiahuassaivrasastoshygiesana


I wonder if any of the following could be affecting this:
* choice of loss function
* a corrupted character somewhere in one of the input/output
* choice of task (I think it defaults to summarization)
* need more epochs?
* some other parameter to change?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

T5 finetune outputting gibberish #7796

Environment info

Who can help

Information

To reproduce

Expected behavior

Observed

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

T5 finetune outputting gibberish #7796

Description

Environment info

Who can help

Information

To reproduce

Expected behavior

Observed

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions