[WIP] T5v1.1 & MT5 #8488

patrickvonplaten · 2020-11-12T09:46:23Z

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to the it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors which may be interested in your PR.

shenfe · 2020-11-12T14:26:28Z

Maybe wrong model config for T5.1.1. For instance, T5.1.1.small should have num_layers=8 and num_heads=6.

See https://github.com/google-research/text-to-text-transfer-transformer/blob/master/t5/models/gin/models/t5.1.1.small.gin

shenfe

Hi, there are some problems I found:

different model configs, compared with the official
relu maybe unnecessary in FFN
lm_head weight should not share with the embedding of enc/dec

And just point it out if anything wrong :)

patrickvonplaten · 2020-11-12T15:08:39Z

Maybe wrong model config for T5.1.1. For instance, T5.1.1.small should have num_layers=8 and num_heads=6.

See https://github.com/google-research/text-to-text-transfer-transformer/blob/master/t5/models/gin/models/t5.1.1.small.gin

Thanks yeah, I implemented that.

The new model structure is now equal to mesh t5 v1.1.

If you download the t5v1.1 t5-small checkpoint and replace the corresponding path in check_t5_against_hf.py you can see that the models are equal.

There is still quite some work to do: write more tests, lots of cleaning and better design, and check if mT5 works with it.

shenfe · 2020-11-13T03:57:33Z

If you download the t5v1.1 t5-small checkpoint and replace the corresponding path in check_t5_against_hf.py you can see that the models are equal.

Hi, check_t5_against_hf.py still fails if I use a longer input text instead of Hello there, like Hello there. Let's put more words in more languages than I originally thought.

shenfe · 2020-11-13T04:01:01Z

check_t5_against_hf.py

+loss = model(input_ids, labels=labels).loss
+mesh_tf_loss = -(labels.shape[-1] * loss.item())
+
+if mesh_tf_loss - score[0][0] < 1e-4:


Maybe better to use abs() here

I will delete this file eventually - it's just for now :-)

patrickvonplaten · 2020-11-13T09:10:57Z

If you download the t5v1.1 t5-small checkpoint and replace the corresponding path in check_t5_against_hf.py you can see that the models are equal.

Hi, check_t5_against_hf.py still fails if I use a longer input text instead of Hello there, like Hello there. Let's put more words in more languages than I originally thought.

Hmm, it works for me - do you experience that for T5v1.1 or mT5?

patrickvonplaten · 2020-11-13T09:13:37Z

src/transformers/modeling_t5v2.py

+        return self.weight * x
+
+
+class T5v2DenseReluDense(nn.Module):


This class differs from previous T5DenseReluDense significantly -> different weights are used here

patrickvonplaten · 2020-11-13T09:14:47Z

src/transformers/modeling_t5v2.py

+        )
+
+        sequence_output = decoder_outputs[0]
+        # Rescale output before projecting on vocab


For T5v1.1 there is no rescaling because the input and output embedding are not tied.

I agree on these changes. But the PyTorch T5.1.1 model still differs from the official tf version. I'm working on it too.

shenfe · 2020-11-14T09:00:52Z

If you download the t5v1.1 t5-small checkpoint and replace the corresponding path in check_t5_against_hf.py you can see that the models are equal.

Hi, check_t5_against_hf.py still fails if I use a longer input text instead of Hello there, like Hello there. Let's put more words in more languages than I originally thought.

Hmm, it works for me - do you experience that for T5v1.1 or mT5?

Aha, the checking is OK now. Yesterday I made a mistake that when I changed the test input sentence in the check script, I didn't update the input length for MTF model from 4 to a longer value like 128. So actually the MTF model and PyTorch model received different inputs, and of course got different results.

Besides, if I add the z-loss to the CE loss at last, it differs from MTF score again. I just found MTF ignores z-loss when not training (code). So I think MTF model score does not include z-loss, but its training does, which is absent from HF T5 training. Well, this is absolutely not a blocking issue now.

Appreciate your great work :)

patrickvonplaten · 2020-11-15T21:04:29Z

closing in favor of #8552.

add new t5 model

d1cc3de

patrickvonplaten changed the title ~~[T5v1.1 & MT5] add new t5 model~~ [WIP] T5v1.1 & MT5 Nov 12, 2020

patrickvonplaten mentioned this pull request Nov 12, 2020

🌟 T5 V1.1 #6285

Closed

3 tasks

shenfe suggested changes Nov 12, 2020

View reviewed changes

confirm that models are equal

e34fccc

patrickvonplaten mentioned this pull request Nov 12, 2020

T5 Conversion from Original Tensorflow Produce rubbish Text #7791

Closed

4 tasks

add mt5 check

b92cc8e

shenfe reviewed Nov 13, 2020

View reviewed changes

patrickvonplaten commented Nov 13, 2020

View reviewed changes

patrickvonplaten closed this Nov 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] T5v1.1 & MT5 #8488

[WIP] T5v1.1 & MT5 #8488

Uh oh!

patrickvonplaten commented Nov 12, 2020

Uh oh!

shenfe commented Nov 12, 2020

Uh oh!

shenfe left a comment

Uh oh!

patrickvonplaten commented Nov 12, 2020

Uh oh!

shenfe commented Nov 13, 2020

Uh oh!

shenfe Nov 13, 2020

Uh oh!

patrickvonplaten Nov 13, 2020

Uh oh!

patrickvonplaten commented Nov 13, 2020

Uh oh!

patrickvonplaten Nov 13, 2020

Uh oh!

patrickvonplaten Nov 13, 2020

Uh oh!

shenfe Nov 13, 2020

Uh oh!

shenfe commented Nov 14, 2020

Uh oh!

patrickvonplaten commented Nov 15, 2020

Uh oh!

Uh oh!

[WIP] T5v1.1 & MT5 #8488

[WIP] T5v1.1 & MT5 #8488

Uh oh!

Conversation

patrickvonplaten commented Nov 12, 2020

What does this PR do?

Before submitting

Who can review?

Uh oh!

shenfe commented Nov 12, 2020

Uh oh!

shenfe left a comment

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten commented Nov 12, 2020

Uh oh!

shenfe commented Nov 13, 2020

Uh oh!

shenfe Nov 13, 2020

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten Nov 13, 2020

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten commented Nov 13, 2020

Uh oh!

patrickvonplaten Nov 13, 2020

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten Nov 13, 2020

Choose a reason for hiding this comment

Uh oh!

shenfe Nov 13, 2020

Choose a reason for hiding this comment

Uh oh!

shenfe commented Nov 14, 2020

Uh oh!

patrickvonplaten commented Nov 15, 2020

Uh oh!

Uh oh!