Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] LlavaNextProcessor '_get_unpadded_features' method #33263

Merged

Conversation

laurentd-lunit
Copy link
Contributor

@laurentd-lunit laurentd-lunit commented Sep 2, 2024

What does this PR do?

Fixes 33261 (issue)
The typo in LlavaNextProcessor's _get_unpadded_features method can lead to size mismatch between the number of image tokens computed by the processor and the number of image tokens resulting of the vision tower and then leads to an error when applying the masked_scatter operation (here)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

I believe @zucchini-nlp would be the right person to tag here.

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for fixing. There's another PR (#32907) to clean-up the bugs left from the prev refactoring, but I am okay to merge this one now. The other one will take some time to merge :)

@zucchini-nlp
Copy link
Member

The tests didn't show the issue probably because we usually have a square image, instead of 2-1 or 1-2 resolution. @laurentd-lunit can you add a test for this one pls?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Indeed it would be good to have a test to ensure this does not happen again

@laurentd-lunit
Copy link
Contributor Author

@zucchini-nlp @LysandreJik Thanks for the reviews! I added a test too. check_repository_consistency checks fail now but this is related to other changes than those I made so not sure if I should fix it here?

@zucchini-nlp
Copy link
Member

@laurentd-lunit can you run make fix-copies? Yes, seems like the PR recently merged didn't rebase on main before it was merged

@zucchini-nlp
Copy link
Member

cool, thanks for fixing the copies, will merge the PR now

@zucchini-nlp zucchini-nlp merged commit d703477 into huggingface:main Sep 4, 2024
17 checks passed
itazap pushed a commit to NielsRogge/transformers that referenced this pull request Sep 20, 2024
…#33263)

* [fix] LlavaNextProcessor '_get_unpadded_features' method

* [tests] add test_image_token_filling

* [chore] style + comment

* [minor] improve readability

* [chore] run make fix-copies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LlavaNextProcessor bug in _get_unpadded_features
4 participants