Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add Fusion-in-Decoder #15902

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

bhavitvyamalik
Copy link
Contributor

What does this PR do?

This PR adds the Fusion-in-Decoder model to the repository.

Paper: https://arxiv.org/abs/2007.01282
Code: https://github.com/facebookresearch/FiD

Who can review?

Anyone in the community is free to review the PR once the tests have passed.
@patil-suraj, @patrickvonplaten, @qqaatw

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@bhavitvyamalik
Copy link
Contributor Author

bhavitvyamalik commented Mar 2, 2022

In their model code, they are using EncoderWrapper and CheckpointWrapper on top of T5ForConditionalGeneration. The model loads without adding them too but then it gives warnings like:

Some weights of the model checkpoint at ../../../FiD/pretrained_models/nq_reader_base/ were not used when initializing T5ForConditionalGeneration: ['encoder.encoder.block.10.module.layer.0.SelfAttention.k.weight', 'encoder.encoder.block.3.module.layer.0.SelfAttention.q.weight', 'encoder.encoder.block.0.module.layer.0.layer_norm.weight', 'encoder.encoder.block.10.module.layer.0.SelfAttention.o.weight', 'encoder.encoder.block.4.module.layer.1.DenseReluDense.wi.weight', 'encoder.encoder.block.11.module.layer.0.SelfAttention.v.weight', 'encoder.encoder.block.11.module.layer.0.layer_norm.weight', 'encoder.encoder.block.1.module.layer.1.DenseReluDense.wi.weight', 'encoder.encoder.block.4.module.layer.1.layer_norm.weight', 'encoder.encoder.block.5.module.layer.0.SelfAttention.q.weight', 'encoder.encoder.block.6.module.layer.0.SelfAttention.v.weight', 'encoder.encoder.block.5.module.layer.0.SelfAttention.v.weight', 'encoder.encoder.block.1.module.layer.1.DenseReluDense.wo.weight', 'encoder.encoder.block.4.module.layer.0.SelfAttention.q.weight', 'encoder.encoder.block.1.module.layer.0.SelfAttention.v.weight', 'encoder.encoder.block.2.module.layer.0.SelfAttention.o.weight', 'encoder.encoder.block.8.module.layer.1.layer_norm.weight', 'encoder.encoder.block.7.module.layer.0.layer_norm.weight', 'encoder.encoder.block.6.module.layer.1.DenseReluDense.wi.weight', 'encoder.encoder.block.10.module.layer.0.SelfAttention.q.weight', 'encoder.encoder.block.4.module.layer.0.SelfAttention.k.weight', 'encoder.encoder.block.9.module.layer.1.layer_norm.weight', 'encoder.encoder.block.9.module.layer.0.SelfAttention.o.weight', 'encoder.encoder.block.3.module.layer.1.DenseReluDense.wo.weight', 'encoder.encoder.block.2.module.layer.0.layer_norm.weight', 'encoder.encoder.block.11.module.layer.0.SelfAttention.o.weight', 'encoder.encoder.block.10.module.layer.0.layer_norm.weight', 'encoder.encoder.block.4.module.layer.0.SelfAttention.o.weight', 'encoder.encoder.block.9.module.layer.1.DenseReluDense.wo.weight', 'encoder.encoder.block.5.module.layer.0.SelfAttention.k.weight', 'encoder.encoder.block.6.module.layer.0.SelfAttention.k.weight', 'encoder.encoder.block.4.module.layer.1.DenseReluDense.wo.weight', 'encoder.encoder.block.1.module.layer.0.SelfAttention.q.weight', 'encoder.encoder.block.0.module.layer.1.DenseReluDense.wi.weight', 'encoder.encoder.block.3.module.layer.0.layer_norm.weight', 'encoder.encoder.block.3.module.layer.1.DenseReluDense.wi.weight', 'encoder.encoder.block.0.module.layer.0.SelfAttention.k.weight', 'encoder.encoder.block.9.module.layer.0.layer_norm.weight', 'encoder.encoder.block.2.module.layer.0.SelfAttention.k.weight', 'encoder.encoder.block.2.module.layer.0.SelfAttention.v.weight', 'encoder.encoder.block.0.module.layer.1.layer_norm.weight', 'encoder.encoder.block.1.module.layer.0.layer_norm.weight', 'encoder.encoder.block.5.module.layer.0.SelfAttention.o.weight', 'encoder.encoder.block.11.module.layer.0.SelfAttention.q.weight', 'encoder.encoder.block.5.module.layer.1.DenseReluDense.wi.weight', 'encoder.encoder.block.2.module.layer.1.layer_norm.weight', 'encoder.encoder.block.5.module.layer.0.layer_norm.weight', 'encoder.encoder.block.5.module.layer.1.DenseReluDense.wo.weight', 'encoder.encoder.block.3.module.layer.0.SelfAttention.k.weight', 'encoder.encoder.block.3.module.layer.1.layer_norm.weight', 'encoder.encoder.block.6.module.layer.0.layer_norm.weight', 'encoder.encoder.block.0.module.layer.0.SelfAttention.q.weight', 'encoder.encoder.block.8.module.layer.0.SelfAttention.q.weight', 'encoder.encoder.block.0.module.layer.0.SelfAttention.relative_attention_bias.weight', 'encoder.encoder.block.5.module.layer.1.layer_norm.weight', 'encoder.encoder.block.11.module.layer.1.DenseReluDense.wo.weight', 'encoder.encoder.block.1.module.layer.1.layer_norm.weight', 'encoder.encoder.block.6.module.layer.1.layer_norm.weight', 'encoder.encoder.block.7.module.layer.0.SelfAttention.k.weight', 'encoder.encoder.block.0.module.layer.0.SelfAttention.v.weight', 'encoder.encoder.block.10.module.layer.1.DenseReluDense.wo.weight', 'encoder.encoder.block.7.module.layer.0.SelfAttention.v.weight', 'encoder.encoder.block.3.module.layer.0.SelfAttention.v.weight', 'encoder.encoder.block.9.module.layer.1.DenseReluDense.wi.weight', 'encoder.encoder.block.10.module.layer.1.DenseReluDense.wi.weight', 'encoder.encoder.block.2.module.layer.1.DenseReluDense.wo.weight', 'encoder.encoder.block.0.module.layer.0.SelfAttention.o.weight', 'encoder.encoder.block.1.module.layer.0.SelfAttention.k.weight', 'encoder.encoder.block.4.module.layer.0.layer_norm.weight', 'encoder.encoder.block.2.module.layer.1.DenseReluDense.wi.weight', 'encoder.encoder.block.9.module.layer.0.SelfAttention.k.weight', 'encoder.encoder.final_layer_norm.weight', 'encoder.encoder.block.6.module.layer.1.DenseReluDense.wo.weight', 'encoder.encoder.block.6.module.layer.0.SelfAttention.q.weight', 'encoder.encoder.block.10.module.layer.0.SelfAttention.v.weight', 'encoder.encoder.block.11.module.layer.0.SelfAttention.k.weight', 'encoder.encoder.block.7.module.layer.0.SelfAttention.q.weight', 'encoder.encoder.block.7.module.layer.0.SelfAttention.o.weight', 'encoder.encoder.block.10.module.layer.1.layer_norm.weight', 'encoder.encoder.block.9.module.layer.0.SelfAttention.v.weight', 'encoder.encoder.block.3.module.layer.0.SelfAttention.o.weight', 'encoder.encoder.block.11.module.layer.1.layer_norm.weight', 'encoder.encoder.block.8.module.layer.0.SelfAttention.o.weight', 'encoder.encoder.block.4.module.layer.0.SelfAttention.v.weight', 'encoder.encoder.block.6.module.layer.0.SelfAttention.o.weight', 'encoder.encoder.block.7.module.layer.1.DenseReluDense.wi.weight', 'encoder.encoder.block.7.module.layer.1.DenseReluDense.wo.weight', 'encoder.encoder.block.7.module.layer.1.layer_norm.weight', 'encoder.encoder.block.0.module.layer.1.DenseReluDense.wo.weight', 'encoder.encoder.block.8.module.layer.1.DenseReluDense.wi.weight', 'encoder.encoder.block.1.module.layer.0.SelfAttention.o.weight', 'encoder.encoder.block.9.module.layer.0.SelfAttention.q.weight', 'encoder.encoder.block.8.module.layer.0.SelfAttention.v.weight', 'encoder.encoder.block.8.module.layer.0.SelfAttention.k.weight', 'encoder.encoder.block.8.module.layer.1.DenseReluDense.wo.weight', 'encoder.encoder.block.2.module.layer.0.SelfAttention.q.weight', 'encoder.encoder.block.8.module.layer.0.layer_norm.weight', 'encoder.encoder.embed_tokens.weight', 'encoder.encoder.block.11.module.layer.1.DenseReluDense.wi.weight']
This IS expected if you are initializing T5ForConditionalGeneration from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing T5ForConditionalGeneration from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

Some weights of T5ForConditionalGeneration were not initialized from the model checkpoint at ../../../FiD/pretrained_models/nq_reader_base/ and are newly initialized: ['encoder.block.1.layer.0.SelfAttention.k.weight', 'encoder.block.11.layer.1.layer_norm.weight', 'encoder.block.11.layer.1.DenseReluDense.wo.weight', 'encoder.block.11.layer.1.DenseReluDense.wi.weight', 'encoder.block.4.layer.0.SelfAttention.v.weight', 'encoder.block.6.layer.1.DenseReluDense.wi.weight', 'encoder.block.2.layer.0.layer_norm.weight', 'encoder.block.5.layer.1.DenseReluDense.wo.weight', 'encoder.block.10.layer.0.SelfAttention.o.weight', 'encoder.block.11.layer.0.SelfAttention.q.weight', 'encoder.block.11.layer.0.layer_norm.weight', 'encoder.block.8.layer.0.layer_norm.weight', 'encoder.block.1.layer.0.SelfAttention.v.weight', 'encoder.final_layer_norm.weight', 'encoder.block.3.layer.1.layer_norm.weight', 'encoder.block.5.layer.0.layer_norm.weight', 'encoder.block.0.layer.1.DenseReluDense.wo.weight', 'encoder.block.2.layer.1.DenseReluDense.wi.weight', 'encoder.block.9.layer.1.layer_norm.weight', 'encoder.block.6.layer.1.layer_norm.weight', 'encoder.block.10.layer.1.DenseReluDense.wo.weight', 'encoder.block.2.layer.0.SelfAttention.o.weight', 'encoder.block.7.layer.1.DenseReluDense.wi.weight', 'encoder.block.5.layer.1.DenseReluDense.wi.weight', 'encoder.block.9.layer.0.SelfAttention.o.weight', 'encoder.block.10.layer.0.SelfAttention.v.weight', 'encoder.block.2.layer.0.SelfAttention.k.weight', 'encoder.block.7.layer.0.SelfAttention.q.weight', 'encoder.block.0.layer.1.layer_norm.weight', 'encoder.block.10.layer.0.SelfAttention.k.weight', 'encoder.block.7.layer.0.SelfAttention.v.weight', 'encoder.block.3.layer.0.SelfAttention.v.weight', 'encoder.block.4.layer.0.SelfAttention.k.weight', 'encoder.block.7.layer.0.SelfAttention.o.weight', 'encoder.block.1.layer.1.DenseReluDense.wi.weight', 'encoder.block.0.layer.1.DenseReluDense.wi.weight', 'encoder.block.0.layer.0.SelfAttention.v.weight', 'encoder.block.5.layer.1.layer_norm.weight', 'encoder.block.5.layer.0.SelfAttention.o.weight', 'encoder.block.11.layer.0.SelfAttention.k.weight', 'encoder.block.11.layer.0.SelfAttention.v.weight', 'encoder.block.9.layer.0.SelfAttention.q.weight', 'encoder.block.7.layer.1.DenseReluDense.wo.weight', 'encoder.block.5.layer.0.SelfAttention.q.weight', 'encoder.block.8.layer.1.DenseReluDense.wi.weight', 'encoder.block.4.layer.0.SelfAttention.o.weight', 'encoder.block.3.layer.0.SelfAttention.k.weight', 'encoder.block.9.layer.1.DenseReluDense.wo.weight', 'encoder.block.10.layer.0.layer_norm.weight', 'encoder.block.6.layer.1.DenseReluDense.wo.weight', 'encoder.block.8.layer.0.SelfAttention.v.weight', 'encoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight', 'encoder.block.7.layer.0.SelfAttention.k.weight', 'encoder.block.6.layer.0.SelfAttention.q.weight', 'encoder.block.7.layer.0.layer_norm.weight', 'encoder.block.5.layer.0.SelfAttention.k.weight', 'encoder.block.11.layer.0.SelfAttention.o.weight', 'encoder.block.4.layer.0.SelfAttention.q.weight', 'encoder.block.7.layer.1.layer_norm.weight', 'encoder.block.2.layer.1.layer_norm.weight', 'encoder.block.5.layer.0.SelfAttention.v.weight', 'encoder.block.3.layer.1.DenseReluDense.wi.weight', 'encoder.block.0.layer.0.layer_norm.weight', 'encoder.block.6.layer.0.SelfAttention.o.weight', 'encoder.block.3.layer.0.SelfAttention.o.weight', 'encoder.block.3.layer.0.layer_norm.weight', 'encoder.block.10.layer.1.DenseReluDense.wi.weight', 'encoder.block.4.layer.1.DenseReluDense.wo.weight', 'encoder.block.9.layer.0.SelfAttention.k.weight', 'encoder.block.2.layer.0.SelfAttention.v.weight', 'encoder.block.1.layer.0.layer_norm.weight', 'encoder.block.1.layer.0.SelfAttention.o.weight', 'encoder.block.2.layer.0.SelfAttention.q.weight', 'encoder.block.8.layer.1.DenseReluDense.wo.weight', 'encoder.block.2.layer.1.DenseReluDense.wo.weight', 'encoder.block.9.layer.1.DenseReluDense.wi.weight', 'encoder.block.6.layer.0.SelfAttention.v.weight', 'encoder.block.9.layer.0.layer_norm.weight', 'encoder.block.8.layer.0.SelfAttention.q.weight', 'encoder.block.1.layer.0.SelfAttention.q.weight', 'encoder.block.8.layer.0.SelfAttention.o.weight', 'encoder.block.10.layer.1.layer_norm.weight', 'encoder.block.0.layer.0.SelfAttention.o.weight', 'encoder.block.1.layer.1.layer_norm.weight', 'encoder.block.6.layer.0.layer_norm.weight', 'encoder.block.3.layer.1.DenseReluDense.wo.weight', 'encoder.block.8.layer.0.SelfAttention.k.weight', 'encoder.block.6.layer.0.SelfAttention.k.weight', 'encoder.block.1.layer.1.DenseReluDense.wo.weight', 'encoder.block.8.layer.1.layer_norm.weight', 'encoder.block.0.layer.0.SelfAttention.q.weight', 'encoder.block.9.layer.0.SelfAttention.v.weight', 'encoder.block.4.layer.0.layer_norm.weight', 'encoder.block.4.layer.1.DenseReluDense.wi.weight', 'encoder.block.4.layer.1.layer_norm.weight', 'encoder.block.10.layer.0.SelfAttention.q.weight', 'encoder.block.0.layer.0.SelfAttention.k.weight', 'encoder.block.3.layer.0.SelfAttention.q.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Adding those modules on top of T5ForConditionalGeneration would be the right thing to do? If yes, then are there any existing examples I can look into to understand how to implement it?

@patrickvonplaten
Copy link
Contributor

Think we should rename the weigths here then -> @patil-suraj think you know best how to guide @bhavitvyamalik here

@huggingface huggingface deleted a comment from github-actions bot Apr 18, 2022
@patil-suraj patil-suraj added the WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress label Apr 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants