Add support for XLM-R XL and XXL models #13210

Soonhwan-Kwon · 2021-08-21T02:41:24Z

This PR adds support for the newly released XL and XXL models for XLM-R. These models are described in the "Larger-Scale Transformers for Multilingual Masked Language Modeling" paper.

I compared fairseq and transformers side by side,
and managed output same.

torch.Size([1, 10, 250880]) torch.Size([1, 10, 250880])
max_absolute_diff = 0.00022125244140614
Do both models outut the same tensors? 🔥

Since fairseq roberta to transformers conversion was made a long time ago,
transformers architecture differs far from fairseq which originally started from,
and it makes quite confusion to write right code.
I synced transformers code to allow fairseq model structure.

And the original PR #12082 (comment) was closed by its author @stefan-it and the PR(https://github.com/stefan-it/transformers/pull/1) I pushed for his repo about 40 days ago but got no response,
so I opened the new PR.

stefan-it · 2021-08-27T10:06:02Z

Hi @Soonhwan-Kwon ,

sorry for the late reply! I discussed this topic with @patrickvonplaten a while ago and we came to the conclusion that it would be better to have a new model/class name for it, such as XLMRobertaExtraLarge to avoid these if self.normalize_before switches.

I've also tested the model implementation on a GLUE task, but the result was not very good. The model is so large, that it was impossible for me to test it on a GPU - even with batch size 1. Then I did some DeepSpeed tests, but on my V100 I would have to wait more than 3 days for the smallest GLUE task - and the final result was not performing well 🤔

Soonhwan-Kwon · 2021-08-27T12:54:24Z

@stefan-it thank you for the reply, and I have A100 80gb machine if you need any cross check.

mdavoudi90 · 2021-08-30T21:43:28Z

@Soonhwan-Kwon @stefan-it Can you share your Deepspeed configuration for loading the XLMR-xl? I'm getting Nan as the loss from deepspeed after using your code changes for the conversion. @Soonhwan-Kwon Do you have a plan to create a standalone file for XLMRobertaExtraLarge? The reason is that you current file change breaks the conversion for the large and base model.

ccclyu · 2021-09-03T15:58:20Z

@Soonhwan-Kwon @stefan-it Can you share your Deepspeed configuration for loading the XLMR-xl? I'm getting Nan as the loss from deepspeed after using your code changes for the conversion. @Soonhwan-Kwon Do you have a plan to create a standalone file for XLMRobertaExtraLarge? The reason is that you current file change breaks the conversion for the large and base model.

Maybe I could paste my fine-tuning script by loading the XLM-Roberta-XLarge model, which is converted from @Soonhwan-Kwon 's script. You could run the script and have a double check with it.

deepspeed --num_gpus=8 run_xnli.py --model_name_or_path /mnt/xlm-roberta-xlarge \
  --deepspeed ds_config_zero3.json \
  --language zh \
  --train_language en \
  --do_predict \
  --max_seq_length 128 \
  --per_device_train_batch_size 4 \
  --learning_rate 2e-6 \
  --logging_steps 100 \
  --eval_steps 100 \
  --save_steps 5000 \
  --num_train_epochs 5 \
  --output_dir /mnt/output_xlmr \
  --cache_dir  /mnt/cache  \
  --fp16   \
  --overwrite_output_dir \
  --evaluation_strategy "steps" \
  --dataloader_num_workers 8 \
  --use_fast_tokenizer False

patrickvonplaten · 2021-09-14T15:08:49Z

src/transformers/models/roberta/modeling_roberta.py

@@ -81,7 +81,9 @@ def __init__(self, config):

        # self.LayerNorm is not snake-cased to stick with TensorFlow model variable name and be able to load
        # any TensorFlow checkpoint file
-        self.LayerNorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
+        self.normalize_embeddings = config.normalize_embeddings


Thanks a lot for the PR @Soonhwan-Kwon ! for transformers we have a rather strict rule to not adapt existing modeling files for new modeling checkpoints so in this case here it would be great if you could create a new modeling_xlm_roberta_xl.py file

Thank you for the review, and I began to make xlm_roberta_xl as your suggestion.

Soonhwan-Kwon added 2 commits August 21, 2021 10:03

add transformers model conversion of XLM-R xl,xxl fairseq

ff6c05e

add transformers model conversion of XLM-R xl,xxl fairseq

3678e11

Soonhwan-Kwon marked this pull request as draft August 22, 2021 09:56

patrickvonplaten reviewed Sep 14, 2021

View reviewed changes

LysandreJik mentioned this pull request Sep 17, 2021

XLM-R XL/XXL #12071

Closed

3 tasks

Soonhwan-Kwon mentioned this pull request Sep 24, 2021

Add support for XLM-R XL and XXL models by modeling_xlm_roberta_xl.py #13727

Merged

2 tasks

Soonhwan-Kwon closed this Sep 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for XLM-R XL and XXL models #13210

Add support for XLM-R XL and XXL models #13210

Uh oh!

Soonhwan-Kwon commented Aug 21, 2021 •

edited

Loading

Uh oh!

stefan-it commented Aug 27, 2021 •

edited

Loading

Uh oh!

Soonhwan-Kwon commented Aug 27, 2021

Uh oh!

mdavoudi90 commented Aug 30, 2021

Uh oh!

ccclyu commented Sep 3, 2021 •

edited

Loading

Uh oh!

patrickvonplaten Sep 14, 2021

Uh oh!

Soonhwan-Kwon Sep 19, 2021 •

edited

Loading

Uh oh!

Uh oh!

Add support for XLM-R XL and XXL models #13210

Add support for XLM-R XL and XXL models #13210

Uh oh!

Conversation

Soonhwan-Kwon commented Aug 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stefan-it commented Aug 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Soonhwan-Kwon commented Aug 27, 2021

Uh oh!

mdavoudi90 commented Aug 30, 2021

Uh oh!

ccclyu commented Sep 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patrickvonplaten Sep 14, 2021

Choose a reason for hiding this comment

Uh oh!

Soonhwan-Kwon Sep 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Soonhwan-Kwon commented Aug 21, 2021 •

edited

Loading

stefan-it commented Aug 27, 2021 •

edited

Loading

ccclyu commented Sep 3, 2021 •

edited

Loading

Soonhwan-Kwon Sep 19, 2021 •

edited

Loading