Skip to content

Conversation

lewtun
Copy link
Member

@lewtun lewtun commented Apr 28, 2022

What does this PR do?

This PR removes masked image modeling from the list of supported features in the ONNX exporter. As explained by @NielsRogge, BEiT cannot be loaded with the AutoModelForMaskedImageModeling class due to:

Well yeah that's because BEiT does masked image modeling by predicting visual tokens of a VQ-VAE, whereas the other ones predict pixel values (RGB) as in the SimMIM paper. So I'm afraid BEiT cannot be added to this auto class.

I've also added a note in the BEiT docs to help users who don't know these details. I've also checked that the slow tests pass for ONNX with

RUN_SLOW=1 pytest tests/onnx/test_onnx_v2.py -s

Edit: we should merge this after #16981 to ensure the RoFormer tests pass first

@lewtun lewtun requested review from sgugger and NielsRogge April 28, 2022 05:16
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Apr 28, 2022

The documentation is not available anymore as the PR was closed or merged.

@NielsRogge
Copy link
Contributor

Hi,

There's a reason I haven't added BEiT to the auto classes. It's because it can't be used with the run_mim.py script, because BEiT handles masked image modeling differently compared to the other ones (which do it similar to the way it's defined in SimMIM paper). So this may confuse users, maybe we should properly document it that BEiT is not the same as the other ones

@lewtun
Copy link
Member Author

lewtun commented Apr 28, 2022

Ah I see, but isn't a bit odd to exclude BEiT just because it isn't compatible with our example scripts?

For instance, is there anything fundamentally wrong with loading BeitForMaskedImageModeling via the autoclass if I'm rolling my own masked image modeling code?

If not, I'd prefer to keep BEIT in the autoclasses and put the warning inside the run_mim.py script if a user tries to run it with this architecture

@lewtun
Copy link
Member Author

lewtun commented Apr 28, 2022

Hmm maybe there is a fundamental issue with using BEiT in the autoclasses as I'm seeing the torch tests fail with:

self = BeitEmbeddings(
  (patch_embeddings): PatchEmbeddings(
    (projection): Conv2d(3, 32, kernel_size=(2, 2), stride=(2, 2))
  )
  (dropout): Dropout(p=0.1, inplace=False)
)
pixel_values = tensor([[[[7.7614e-01, 1.7656e-01, 6.0460e-01,  ..., 3.9106e-01,
           5.2019e-01, 8.9339e-01],
          [2.7568...1, 9.9367e-01],
          [9.4963e-01, 1.6943e-01, 9.7946e-01,  ..., 1.9085e-01,
           1.9910e-01, 4.6059e-02]]]])
bool_masked_pos = tensor([[0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        ...,
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0]])

    def forward(self, pixel_values: torch.Tensor, bool_masked_pos: Optional[torch.BoolTensor] = None) -> torch.Tensor:
    
        embeddings = self.patch_embeddings(pixel_values)
        batch_size, seq_len, _ = embeddings.size()
    
        cls_tokens = self.cls_token.expand(batch_size, -1, -1)
        if bool_masked_pos is not None:
>           mask_tokens = self.mask_token.expand(batch_size, seq_len, -1)
E           AttributeError: 'NoneType' object has no attribute 'expand'

@NielsRogge
Copy link
Contributor

NielsRogge commented Apr 28, 2022

Well yeah that's because BEiT does masked image modeling by predicting visual tokens of a VQ-VAE, whereas the other ones predict pixel values (RGB) as in the SimMIM paper. So I'm afraid BEiT cannot be added to this auto class.

@lewtun
Copy link
Member Author

lewtun commented Apr 28, 2022

OK thanks for the clarification. I'll remove this feature from the ONNX export and add a note to the BEiT docs :)

@@ -59,6 +59,12 @@ Tips:
`use_relative_position_bias` attribute of [`BeitConfig`] to `True` in order to add
position embeddings.

<Tip warning={true}>

BEiT does masked image modeling by predicting visual tokens of a Vector-Quantize Variational Autoencoder (VQ-VAE), whereas other vision models like ViT and DeiT predict RGB pixel values. The [`AutoModelForMaskedImageModeling`] class supports pixel-based image modeling, so you will need to use [`BeitForMaskedImageModeling`] directly if you wish to do masked image modeling with BEiT.
Copy link
Member Author

@lewtun lewtun Apr 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this should be on the main doc page or within the docstring for the BeitForMaskedImageModeling class. Happy to move it if you want!

Edit: decided it made more sense to put this in the docstring itself in eca26be

@lewtun lewtun changed the title Add BEIT to masked image modeling autoclasses Remove masked image modeling from BEIT ONNX export Apr 28, 2022
"default": OrderedDict({"last_hidden_state": {0: "batch", 1: "sequence"}}),
"image-classification": OrderedDict({"logits": {0: "batch", 1: "sequence"}}),
"masked-im": OrderedDict({"logits": {0: "batch", 1: "sequence"}}),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I had to add this feature to support masked image modeling in general, I also went ahead and rearranged these features alphabetically as it was getting annoying to inspect what was available

onnx_config_cls=MBartOnnxConfig,
),
# BEiT cannot be used with the masked image modeling autoclass, so this feature is excluded here
"beit": supported_features_mapping("default", "image-classification", onnx_config_cls=BeitOnnxConfig),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I had to edit the features here, I also went ahead and reordered all the models alphabetically since the list is now quite long and annoying to navigate

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is hard to review as a diff, so I'll trust you didn't forget any of them ;-)

Copy link
Member Author

@lewtun lewtun Apr 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sorry about that. I did a sanity check that all the features agree with those on the main branch:

from transformers.onnx import FeaturesManager

# From current branch
features_new = FeaturesManager._SUPPORTED_MODEL_TYPE
# From main brach
features_old = FeaturesManager._SUPPORTED_MODEL_TYPE

for k,v in features_new.items():
    # Skip beit since it's features are different on `main`
    if k == "beit":
        continue
    assert features_old[k].keys() == v.keys()

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

onnx_config_cls=MBartOnnxConfig,
),
# BEiT cannot be used with the masked image modeling autoclass, so this feature is excluded here
"beit": supported_features_mapping("default", "image-classification", onnx_config_cls=BeitOnnxConfig),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is hard to review as a diff, so I'll trust you didn't forget any of them ;-)

lewtun and others added 3 commits April 28, 2022 13:35
@lewtun lewtun merged commit 675e2d1 into main May 4, 2022
@lewtun lewtun deleted the fix-beit-masked-im branch May 4, 2022 08:05
elusenji pushed a commit to elusenji/transformers that referenced this pull request Jun 12, 2022
* Add masked image modelling to task mapping

* Refactor ONNX features to be listed alphabetically

* Add warning about BEiT masked image modeling

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants