🌟 T5 V1.1

# 🌟 New model addition

## Model description

T5 version t5.1.1.* is very similar to the original T5 model, with the following differences:
- GEGLU activation in feed-forward hidden layer, rather than ReLU - see https://arxiv.org/abs/2002.05202 .
- Dropout was turned off in pre-training (quality win). Dropout should be re-enabled during fine-tuning.
- Pre-trained on C4 only without mixing in the downstream tasks.
- no parameter sharing between embedding and classifier layer
- "xl" and "xxl" replace "3B" and "11B". The model shapes are a bit different - larger d_model and smaller num_heads and d_ff.

The key reason why these models are interesting is that - unlike the originally released models - they were trained **only** on unlabeled data and not on any labeled data, making them applicable for few-shot learning experiments. As they are very similar to the original T5 models, I assume they are relatively easy to implement.


## Open source status

* [x] the model implementation is available: (give details) - see https://github.com/google-research/text-to-text-transfer-transformer/
* [x] the model weights are available: (give details) - see https://github.com/google-research/text-to-text-transfer-transformer/blob/master/released_checkpoints.md
* [x] who are the authors: (mention them, if possible by @gh-username) - Colin Raffel ( @craffel ), Noam Shazeer ( @nshazeer ), Adam Roberts ( @adarob ), Katherine Lee, Sharan Narang, Michael Matena ( @mmatena ), Yanqi Zhou, Wei Li, Peter J. Liu

(Also tagging @patrickvonplaten as he is mentioned in the **who to tag** guide for T5)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🌟 T5 V1.1 #6285

🌟 New model addition

Model description

Open source status

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

🌟 T5 V1.1 #6285

Description

🌟 New model addition

Model description

Open source status

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions