Skip to content

🌟 T5 V1.1 #6285

Closed
Closed
@timoschick

Description

@timoschick

🌟 New model addition

Model description

T5 version t5.1.1.* is very similar to the original T5 model, with the following differences:

  • GEGLU activation in feed-forward hidden layer, rather than ReLU - see https://arxiv.org/abs/2002.05202 .
  • Dropout was turned off in pre-training (quality win). Dropout should be re-enabled during fine-tuning.
  • Pre-trained on C4 only without mixing in the downstream tasks.
  • no parameter sharing between embedding and classifier layer
  • "xl" and "xxl" replace "3B" and "11B". The model shapes are a bit different - larger d_model and smaller num_heads and d_ff.

The key reason why these models are interesting is that - unlike the originally released models - they were trained only on unlabeled data and not on any labeled data, making them applicable for few-shot learning experiments. As they are very similar to the original T5 models, I assume they are relatively easy to implement.

Open source status

(Also tagging @patrickvonplaten as he is mentioned in the who to tag guide for T5)

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions