-
Notifications
You must be signed in to change notification settings - Fork 30.3k
Remove masked image modeling from BEIT ONNX export #16980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The documentation is not available anymore as the PR was closed or merged. |
Hi, There's a reason I haven't added BEiT to the auto classes. It's because it can't be used with the run_mim.py script, because BEiT handles masked image modeling differently compared to the other ones (which do it similar to the way it's defined in SimMIM paper). So this may confuse users, maybe we should properly document it that BEiT is not the same as the other ones |
Ah I see, but isn't a bit odd to exclude BEiT just because it isn't compatible with our example scripts? For instance, is there anything fundamentally wrong with loading If not, I'd prefer to keep BEIT in the autoclasses and put the warning inside the |
Hmm maybe there is a fundamental issue with using BEiT in the autoclasses as I'm seeing the torch tests fail with:
|
Well yeah that's because BEiT does masked image modeling by predicting visual tokens of a VQ-VAE, whereas the other ones predict pixel values (RGB) as in the SimMIM paper. So I'm afraid BEiT cannot be added to this auto class. |
OK thanks for the clarification. I'll remove this feature from the ONNX export and add a note to the BEiT docs :) |
docs/source/en/model_doc/beit.mdx
Outdated
@@ -59,6 +59,12 @@ Tips: | |||
`use_relative_position_bias` attribute of [`BeitConfig`] to `True` in order to add | |||
position embeddings. | |||
|
|||
<Tip warning={true}> | |||
|
|||
BEiT does masked image modeling by predicting visual tokens of a Vector-Quantize Variational Autoencoder (VQ-VAE), whereas other vision models like ViT and DeiT predict RGB pixel values. The [`AutoModelForMaskedImageModeling`] class supports pixel-based image modeling, so you will need to use [`BeitForMaskedImageModeling`] directly if you wish to do masked image modeling with BEiT. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this should be on the main doc page or within the docstring for the BeitForMaskedImageModeling
class. Happy to move it if you want!
Edit: decided it made more sense to put this in the docstring itself in eca26be
"default": OrderedDict({"last_hidden_state": {0: "batch", 1: "sequence"}}), | ||
"image-classification": OrderedDict({"logits": {0: "batch", 1: "sequence"}}), | ||
"masked-im": OrderedDict({"logits": {0: "batch", 1: "sequence"}}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I had to add this feature to support masked image modeling in general, I also went ahead and rearranged these features alphabetically as it was getting annoying to inspect what was available
onnx_config_cls=MBartOnnxConfig, | ||
), | ||
# BEiT cannot be used with the masked image modeling autoclass, so this feature is excluded here | ||
"beit": supported_features_mapping("default", "image-classification", onnx_config_cls=BeitOnnxConfig), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I had to edit the features here, I also went ahead and reordered all the models alphabetically since the list is now quite long and annoying to navigate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is hard to review as a diff, so I'll trust you didn't forget any of them ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sorry about that. I did a sanity check that all the features agree with those on the main
branch:
from transformers.onnx import FeaturesManager
# From current branch
features_new = FeaturesManager._SUPPORTED_MODEL_TYPE
# From main brach
features_old = FeaturesManager._SUPPORTED_MODEL_TYPE
for k,v in features_new.items():
# Skip beit since it's features are different on `main`
if k == "beit":
continue
assert features_old[k].keys() == v.keys()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
onnx_config_cls=MBartOnnxConfig, | ||
), | ||
# BEiT cannot be used with the masked image modeling autoclass, so this feature is excluded here | ||
"beit": supported_features_mapping("default", "image-classification", onnx_config_cls=BeitOnnxConfig), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is hard to review as a diff, so I'll trust you didn't forget any of them ;-)
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add masked image modelling to task mapping * Refactor ONNX features to be listed alphabetically * Add warning about BEiT masked image modeling Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
What does this PR do?
This PR removes masked image modeling from the list of supported features in the ONNX exporter. As explained by @NielsRogge, BEiT cannot be loaded with the
AutoModelForMaskedImageModeling
class due to:I've also added a note in the BEiT docs to help users who don't know these details. I've also checked that the slow tests pass for ONNX with
Edit: we should merge this after #16981 to ensure the RoFormer tests pass first