Add support for XLM-R XL and XXL models by modeling_xlm_roberta_xl.py #13727

Soonhwan-Kwon · 2021-09-24T09:28:44Z

This PR adds support for the newly released XL and XXL models for XLM-R, and . These models are described in the "Larger-Scale Transformers for Multilingual Masked Language Modeling" paper.

And thank you for @patrickvonplaten and @stefan-it, as the review I got from the #13210,
I added modeling_xlm_roberta_xl.py and convert_xlm_roberta_xl_original_pytorch_checkpoint_to_pytorch.py for conversion script.

I compared fairseq and transformers side by side,
and managed output same.

torch.Size([1, 11, 250880]) torch.Size([1, 11, 250880])
max_absolute_diff = 0.000186920166015625
Do both models output the same tensors? 🔥
Saving model to converted_xlmr_xl
Configuration saved in converted_xlmr_xl/config.json
Model weights saved in converted_xlmr_xl/pytorch_model.bin

Since fairseq roberta to transformers conversion was made a long time ago,
transformers architecture differs far from fairseq which originally started from,
and it makes quite confusion to write right code.
I synced transformers code to allow fairseq model structure.

add test for XLM-R XL and XXL
upload model for XLM-R XL and XXL to official repo

patrickvonplaten · 2021-09-30T16:45:29Z

Thanks for the PR @Soonhwan-Kwon!

Could you also add a test file and some integration tests? :-)

Soonhwan-Kwon · 2021-10-13T09:16:15Z

@patrickvonplaten I started to work on test file, It seems test needs models uploaded on official repo. but how I can upload model files for xlm-roberta-xl or xlm-roberta-xxl to the official repo?

patrickvonplaten · 2021-11-11T22:46:46Z

Hey @Soonhwan-Kwon,

Thanks a lot for working and this and sorry to reply so late!
Would it be ok to upload the checkpoints for now under your name on the hub and to make the tests pass and then in a last step, will move the checkpoints to the official organization?

Let me know if you need some help fixing the last steps :-)

Soonhwan-Kwon · 2021-11-14T13:46:25Z

Hey @Soonhwan-Kwon,

Thanks a lot for working and this and sorry to reply so late! Would it be ok to upload the checkpoints for now under your name on the hub and to make the tests pass and then in a last step, will move the checkpoints to the official organization?

Let me know if you need some help fixing the last steps :-)

Thank you for the reply, I'm middle of uploading models, but it takes time for xxlarge(over 24GB) model.

Soonhwan-Kwon · 2021-11-15T01:08:04Z

@patrickvonplaten I have uploaded all models, but I have no idea how to fix last steps because I'm kind of newbie here. How can I fix last steps? Thank you in advance.

patrickvonplaten · 2021-11-15T17:08:56Z

@Soonhwan-Kwon, could you maybe also add a configuration file (just copy the xlm-roberta one) and also add a full test suite for the model? :-)

github-actions · 2021-12-10T15:02:13Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Soonhwan-Kwon · 2021-12-11T02:17:40Z

@Soonhwan-Kwon, could you maybe also add a configuration file (just copy the xlm-roberta one) and also add a full test suite for the model? :-)

sorry for the late response. I already added config.json, so is it the tokenizer.json you're talking about? And I added simple test for models(tests/test_modeling_xlm_roberta_xl.py) but where can I find the full test suite?

patrickvonplaten · 2021-12-13T12:05:24Z

Hey @Soonhwan-Kwon,

I meant more a new configuration_xlm_roberta_xl.py python file that is more or less a copy of configuration_xlm_robert.py:-) But I see that the configs are exactly similar so maybe we can leave as is.

@sgugger @LysandreJik - This PR adds the checkpoints of https://ai.facebook.com/blog/-xlm-r-state-of-the-art-cross-lingual-understanding-through-self-supervision/ to transformers. The model is essentially a "scaled-up" version of https://huggingface.co/docs/transformers/master/en/model_doc/xlmroberta#overview . Since the "scaled-up" version has a significantly different architecture (layer_norm is used very different amongst others) we decided to make a new modeling_xlm_roberta_xl.py model file. Now would it be ok for you to a) not have a corresponding configuration_xlm_roberta_xl.py and just use the config_xlm_roberta.py code or do you prefer b) adding a new configuration_xlm_roberta_xl.py file for consistency? I'm a bit indifferent here, but do slighly prefer b). What do you think?

@Soonhwan-Kwon - there are some failing tests which I think can partly be solved by rebasing to current master. Otherwise, if ok for you I'm also happy to dive in the PR and help you finish the last parts of it. Let me know what you prefer :-)

LysandreJik · 2021-12-13T14:39:14Z

Hey @Soonhwan-Kwon, thanks a lot for your PR!!

@patrickvonplaten, I prefer b): a lot of the library is built on the assumption that you have one configuration file/object per modeling file/model objects. Since we've authorized auto models to map one configuration to multiple models this isn't as much of an issue as it could have been in the past, but I'm positive we'll find edge cases where it doesn't work as well as we expect it to simply because of the wrong assumption.

sgugger · 2021-12-13T15:59:57Z

Also, since it falls into our "new architecture test", there should be a new folder regrouping this modeling file and configuration file instead of putting everything in the xlm-roberta folder.

Soonhwan-Kwon · 2021-12-14T03:15:51Z

Hey @Soonhwan-Kwon,

I meant more a new configuration_xlm_roberta_xl.py python file that is more or less a copy of configuration_xlm_robert.py:-) But I see that the configs are exactly similar so maybe we can leave as is.

@sgugger @LysandreJik - This PR adds the checkpoints of https://ai.facebook.com/blog/-xlm-r-state-of-the-art-cross-lingual-understanding-through-self-supervision/ to transformers. The model is essentially a "scaled-up" version of https://huggingface.co/docs/transformers/master/en/model_doc/xlmroberta#overview . Since the "scaled-up" version has a significantly different architecture (layer_norm is used very different amongst others) we decided to make a new modeling_xlm_roberta_xl.py model file. Now would it be ok for you to a) not have a corresponding configuration_xlm_roberta_xl.py and just use the config_xlm_roberta.py code or do you prefer b) adding a new configuration_xlm_roberta_xl.py file for consistency? I'm a bit indifferent here, but do slighly prefer b). What do you think?

@Soonhwan-Kwon - there are some failing tests which I think can partly be solved by rebasing to current master. Otherwise, if ok for you I'm also happy to dive in the PR and help you finish the last parts of it. Let me know what you prefer :-)

@patrickvonplaten Sure, I will be glad if you help the last parts and feel free to dive in this PR.

Soonhwan-Kwon · 2021-12-29T09:21:58Z

@patrickvonplaten I added you as collaborator in my repo, perhaps you might need access.

patrickvonplaten · 2021-12-30T17:18:00Z

Thanks @Soonhwan-Kwon - I'll try to tackle this tomorrow :-)

…into xlm_xl

src/transformers/models/xlm_roberta_xl/configuration_xlm_roberta_xl.py

stefan-it · 2022-01-06T09:12:08Z

Thanks for working on that @Soonhwan-Kwon . I made some minor suggestions and will look at the tokenization part now (to check if there are any differences between XLM-R and XLM-R-XL/XXL :)

stefan-it · 2022-01-06T10:30:52Z

The tokenization part is working as expected. Here are some details:

Underlying sentence piece models (XLM-R and XLM-R-XL) are identical (checked that via torch.hub.load, that downloads the model and stores them under ~/.cache/torch/pytorch_fairseq). Checksums are the same.
Tokenizer mapping (fairseq to spm model) is thankfully the same as for XLM-R, here I documented that mapping:

transformers/src/transformers/models/xlm_roberta/tokenization_xlm_roberta.py

Lines 163 to 167 in 2e9af29

    
           # Original fairseq vocab and spm vocab must be "aligned": 
        
           # Vocab    |    0    |    1    |   2    |    3    |  4  |  5  |  6  |   7   |   8   |  9 
        
           # -------- | ------- | ------- | ------ | ------- | --- | --- | --- | ----- | ----- | ---- 
        
           # fairseq  | '<s>'   | '<pad>' | '</s>' | '<unk>' | ',' | '.' | '▁' | 's'   | '▁de' | '-' 
        
           # spm      | '<unk>' | '<s>'   | '</s>' | ','     | '.' | '▁' | 's' | '▁de' | '-'   | '▁a'

But where does this vocab size "mismatch" come from? XLM-R has a vocab size of 250,002, whereas XLM-R-XL has 250,880.

Fairseq comes with an own dictionary file (dict.txt), for XLM-R it has 249,997 entries, and 250,875 entries for XLM-R-XL.

The dictionary file for XLM-R-XL is the same as XLM-R, but it contains madeupword tokens ranging from madeupword0 to madeupword877 at the end.

…into xlm_xl

sgugger

Looking good! Careful with some `Copied from statements that haven't been adapted to the model name.

src/transformers/models/xlm_roberta_xl/modeling_xlm_roberta_xl.py

patrickvonplaten · 2022-01-28T16:35:09Z

docs/source/model_doc/xlm-roberta-xl.mdx

+  not require `lang` tensors to understand which language is used, and should be able to determine the correct
+  language from the input ids.
+
+This model was contributed by [Soonhwan-Kwon](https://github.com/Soonhwan-Kwon) and [stefan-it](https://huggingface.co/stefan-it). The original code can be found [here](https://github.com/pytorch/fairseq/tree/master/examples/xlmr).


Mentioned you here @Soonhwan-Kwon and @stefan-it

patrickvonplaten

UPDATE: The PR should be ready for merge now. The checkpoints have been moved to the Facebook's org: https://huggingface.co/models?other=xlm-roberta-xl and added some model cards. @Soonhwan-Kwon, I've made you the "main" contributor for this model.

patrickvonplaten · 2022-01-28T17:26:48Z

@stefan-it - it would also be great if you could do a final review :-)

sgugger

Looking good!

README.md

src/transformers/models/xlm_roberta_xl/configuration_xlm_roberta_xl.py

src/transformers/models/xlm_roberta_xl/modeling_xlm_roberta_xl.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

stefan-it · 2022-01-29T13:18:28Z

Really cool! I'm currently running experiments on token classification with that new model 🤗

Soonhwan-Kwon · 2022-01-29T13:19:10Z

@patrickvonplaten @sgugger @stefan-it Thank you for the merge, it was a great experience, and I came to respect committers of transformers. And below revert is just miss click, sorry.

Soonhwan-Kwon added 4 commits September 24, 2021 11:42

add xlm roberta xl

70f0f04

add convert xlm xl fairseq checkpoint to pytorch

b831d60

fix init and documents for xlm-roberta-xl

123aab4

fix indention

1514f67

LysandreJik requested review from patrickvonplaten and stefan-it September 27, 2021 12:12

add test for XLM-R xl,xxl

57d72ca

huggingface deleted a comment from github-actions bot Nov 11, 2021

fix model hub name

bd19941

patrickvonplaten added 5 commits December 31, 2021 13:34

fix some stuff

d2d2715

Merge branch 'master' of https://github.com/huggingface/transformers …

6be7307

…into xlm_xl

up

9b4203f

correct init

5fca25a

fix more

df499c7

patrickvonplaten reviewed Dec 31, 2021

View reviewed changes

src/transformers/models/xlm_roberta_xl/configuration_xlm_roberta_xl.py Outdated Show resolved Hide resolved

patrickvonplaten reviewed Dec 31, 2021

View reviewed changes

src/transformers/models/xlm_roberta_xl/configuration_xlm_roberta_xl.py Outdated Show resolved Hide resolved

fix default values of doc strings

21bcebb

patrickvonplaten added 3 commits January 28, 2022 14:39

fix leftovers

7b058be

Merge branch 'master' of https://github.com/huggingface/transformers …

351ada4

…into xlm_xl

merge to master

c4af533

sgugger approved these changes Jan 28, 2022

View reviewed changes

patrickvonplaten added 6 commits January 28, 2022 14:59

up

864620b

correct hub names

a9b13b8

fix docs

1525c94

fix model

4762f20

up

316a750

finalize

f9ad5ff

patrickvonplaten reviewed Jan 28, 2022

View reviewed changes

patrickvonplaten approved these changes Jan 28, 2022

View reviewed changes

last fix

b9b80f4

patrickvonplaten requested review from sgugger, stefan-it and LysandreJik January 28, 2022 17:03

sgugger approved these changes Jan 28, 2022

View reviewed changes

README.md Outdated Show resolved Hide resolved

src/transformers/models/xlm_roberta_xl/configuration_xlm_roberta_xl.py Outdated Show resolved Hide resolved

src/transformers/models/xlm_roberta_xl/modeling_xlm_roberta_xl.py Show resolved Hide resolved

patrickvonplaten and others added 3 commits January 29, 2022 13:04

Apply suggestions from code review

1d49d20

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

add copied from

ad3e260

make style

9142af2

patrickvonplaten merged commit e09473a into huggingface:master Jan 29, 2022

patrickvonplaten deleted the xlm_xl branch January 29, 2022 12:42

stefan-it mentioned this pull request Feb 3, 2023

Add XLM-V #21330

Closed

2 tasks

amyeroberts mentioned this pull request May 15, 2023

xlm-roberta-xlarge doesn't exist #23279

Closed

4 tasks

Add support for XLM-R XL and XXL models by modeling_xlm_roberta_xl.py #13727

Add support for XLM-R XL and XXL models by modeling_xlm_roberta_xl.py #13727

Uh oh!

Conversation

Soonhwan-Kwon commented Sep 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patrickvonplaten commented Sep 30, 2021

Uh oh!

Soonhwan-Kwon commented Oct 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patrickvonplaten commented Nov 11, 2021

Uh oh!

Soonhwan-Kwon commented Nov 14, 2021

Uh oh!

Soonhwan-Kwon commented Nov 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patrickvonplaten commented Nov 15, 2021

Uh oh!

github-actions bot commented Dec 10, 2021

Uh oh!

Soonhwan-Kwon commented Dec 11, 2021

Uh oh!

patrickvonplaten commented Dec 13, 2021

Uh oh!

LysandreJik commented Dec 13, 2021

Uh oh!

sgugger commented Dec 13, 2021

Uh oh!

Soonhwan-Kwon commented Dec 14, 2021

Uh oh!

Soonhwan-Kwon commented Dec 29, 2021

Uh oh!

patrickvonplaten commented Dec 30, 2021

Uh oh!

Uh oh!

Uh oh!

stefan-it commented Jan 6, 2022

Uh oh!

stefan-it commented Jan 6, 2022

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

patrickvonplaten Jan 28, 2022

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten left a comment

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten commented Jan 28, 2022

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stefan-it commented Jan 29, 2022

Uh oh!

Soonhwan-Kwon commented Jan 29, 2022

Uh oh!

Uh oh!

Soonhwan-Kwon commented Sep 24, 2021 •

edited

Loading

Soonhwan-Kwon commented Oct 13, 2021 •

edited

Loading

Soonhwan-Kwon commented Nov 15, 2021 •

edited

Loading