Error when running MegaForCausalLM example code in Docs #22974

Tylersuard · 2023-04-24T22:45:17Z

System Info

Most recent version of Tranformers from Githup, on Google Colab

Who can help?

@ArthurZucker @younesbelkada

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

This is the example code from the documentation for MegaForCausalLM (https://huggingface.co/docs/transformers/main/model_doc/mega):

from transformers import AutoTokenizer, MegaForCausalLM, AutoConfig
import torch

tokenizer = AutoTokenizer.from_pretrained("mnaylor/mega-base-wikitext")
config = AutoConfig.from_pretrained("mnaylor/mega-base-wikitext")
config.is_decoder = True
config.bidirectional = False
model = MegaForCausalLM.from_pretrained("mnaylor/mega-base-wikitext", config=config)

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)

prediction_logits = outputs.logits

After installing Transformers from source, when I run the above code snippet on Colab, I get this error:

RuntimeError: Error(s) in loading state_dict for MegaForCausalLM:
size mismatch for mega.layers.0.mega_layer.ema_gate.damping_factor: copying a param with shape torch.Size([256, 16, 1]) from checkpoint, the shape in current model is torch.Size([128, 16, 1]).
size mismatch for mega.layers.0.mega_layer.ema_gate.decay_factor: copying a param with shape torch.Size([256, 16, 1]) from checkpoint, the shape in current model is torch.Size([128, 16, 1]).
size mismatch for mega.layers.0.mega_layer.ema_gate.ema_expansion_matrix: copying a param with shape torch.Size([256, 16, 1]) from checkpoint, the shape in current model is torch.Size([128, 16, 1]).
size mismatch for mega.layers.0.mega_layer.ema_gate.kernel_projection_matrix: copying a param with shape torch.Size([256, 16]) from checkpoint, the shape in current model is torch.Size([128, 16]).
size mismatch for mega.layers.1.mega_layer.ema_gate.damping_factor: copying a param with shape torch.Size([256, 16, 1]) from checkpoint, the shape in current model is torch.Size([128, 16, 1]).
size mismatch for mega.layers.1.mega_layer.ema_gate.decay_factor: copying a param with shape torch.Size([256, 16, 1]) from checkpoint, the shape in current model is torch.Size([128, 16, 1]).
size mismatch for mega.layers.1.mega_layer.ema_gate.ema_expansion_matrix: copying a param with shape torch.Size([256, 16, 1]) from checkpoint, the shape in current model is torch.Size([128, 16, 1]).
size mismatch for mega.layers.1.mega_layer.ema_gate.kernel_projection_matrix: copying a param with shape torch.Size([256, 16]) from checkpoint, the shape in current model is torch.Size([128, 16]).
size mismatch for mega.layers.2.mega_layer.ema_gate.damping_factor: copying a param with shape torch.Size([256, 16, 1]) from checkpoint, the shape in current model is torch.Size([128, 16, 1]).
size mismatch for mega.layers.2.mega_layer.ema_gate.decay_factor: copying a param with shape torch.Size([256, 16, 1]) from checkpoint, the shape in current model is torch.Size([128, 16, 1]).
size mismatch for mega.layers.2.mega_layer.ema_gate.ema_expansion_matrix: copying a param with shape torch.Size([256, 16, 1]) from checkpoint, the shape in current model is torch.Size([128, 16, 1]).
size mismatch for mega.layers.2.mega_layer.ema_gate.kernel_projection_matrix: copying a param with shape torch.Size([256, 16]) from checkpoint, the shape in current model is torch.Size([128, 16]).
size mismatch for mega.layers.3.mega_layer.ema_gate.damping_factor: copying a param with shape torch.Size([256, 16, 1]) from checkpoint, the shape in current model is torch.Size([128, 16, 1]).
size mismatch for mega.layers.3.mega_layer.ema_gate.decay_factor: copying a param with shape torch.Size([256, 16, 1]) from checkpoint, the shape in current model is torch.Size([128, 16, 1]).
size mismatch for mega.layers.3.mega_layer.ema_gate.ema_expansion_matrix: copying a param with shape torch.Size([256, 16, 1]) from checkpoint, the shape in current model is torch.Size([128, 16, 1]).
size mismatch for mega.layers.3.mega_layer.ema_gate.kernel_projection_matrix: copying a param with shape torch.Size([256, 16]) from checkpoint, the shape in current model is torch.Size([128, 16]).
You may consider adding ignore_mismatched_sizes=True in the model from_pretrained method.

Expected behavior

The pretrained model would load all weights without error

The text was updated successfully, but these errors were encountered:

ArthurZucker · 2023-04-25T08:03:14Z

Hey! Thanks for reporting! This is because the default configuration argument of bidirectional is True. When setting it to False you reduce the size of the ema matrix. If you still want to use it, ignore_mismatched_sizes=True will help you initialize the model.

Tylersuard · 2023-05-14T19:14:59Z

Thank you for your response. When I set ignore_mismatched_sizes=True the code works. However, the example code in the docs is still incorrect.

amyeroberts · 2023-05-15T11:03:42Z

@Tylersuard Yep, you're right! Would you like to open a PR to update the docs to get the git contribution for spotting?

Tylersuard · 2023-05-16T00:12:54Z

@amyeroberts Absolutely!

Tylersuard · 2023-05-16T00:32:59Z

Ok! I just made the PR here. #23382

Tylersuard changed the title ~~Error when running example code in Docs~~ Error when running MegaForCausalLM example code in Docs Apr 24, 2023

ArthurZucker self-assigned this Apr 25, 2023

ArthurZucker closed this as completed Apr 25, 2023

Tylersuard mentioned this issue May 16, 2023

Debug example code for MegaForCausalLM #23382

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when running MegaForCausalLM example code in Docs #22974

Error when running MegaForCausalLM example code in Docs #22974

Tylersuard commented Apr 24, 2023 •

edited by ArthurZucker

Loading

ArthurZucker commented Apr 25, 2023 •

edited

Loading

Tylersuard commented May 14, 2023

amyeroberts commented May 15, 2023

Tylersuard commented May 16, 2023

Tylersuard commented May 16, 2023

Error when running MegaForCausalLM example code in Docs #22974

Error when running MegaForCausalLM example code in Docs #22974

Comments

Tylersuard commented Apr 24, 2023 • edited by ArthurZucker Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

ArthurZucker commented Apr 25, 2023 • edited Loading

Tylersuard commented May 14, 2023

amyeroberts commented May 15, 2023

Tylersuard commented May 16, 2023

Tylersuard commented May 16, 2023

Tylersuard commented Apr 24, 2023 •

edited by ArthurZucker

Loading

ArthurZucker commented Apr 25, 2023 •

edited

Loading