Skip to content

[aya vision] fix processor for vLLM #38371

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 27, 2025

Conversation

zucchini-nlp
Copy link
Member

What does this PR do?

Fixes #38350

@DarkLight1337 , one quick question. I am trying to add a test on our side, so we don't break anything that vLLM relies on. For the case of aya-vision the error log shows inputs as (text="<image>, images=None") but doing so will fail on other processors where we have extra checks that "num input images == num image tokens"

I wonder how that is bypassed on vLLM and how can I add a test to cover assumptions vLLM has about our processors

@DarkLight1337
Copy link

DarkLight1337 commented May 26, 2025

doing so will fail on other processors where we have extra checks that "num input images == num image tokens"

This behavior seems quite inconsistent across models. I found that most processors are able to take extra <image> tokens (etc.) in the text without problem, i.e. the condition is loosened to num_input_images <= num_image_tokens. As a result, we also adopt this loosened assumption in vLLM.

For the processors that perform this check more strictly, in vLLM we override _call_hf_processor with a special case to call the tokenizer directly if no multi-modal data is provided.

Example:

https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/idefics3.py#L321-L325

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zucchini-nlp
Copy link
Member Author

Ah oke, that makes sense. We haven't been consistent in checking placeholder tokens in processors, and each model contributor added the model as it was in original implementation

I think in that case I can't add a robust test. I believe integration with transformers backend will auto-resolve future issues, I am planning to get it ready in this week

@zucchini-nlp zucchini-nlp requested a review from Cyrilvallez May 26, 2025 09:51
@zucchini-nlp zucchini-nlp added the for patch Tag issues / labels that should be included in the next patch label May 26, 2025
@DarkLight1337
Copy link

Sounds good, thanks for fixing!

@ArthurZucker
Copy link
Collaborator

Thanks for adding the for-patch tag as well!

@zucchini-nlp zucchini-nlp enabled auto-merge (squash) May 27, 2025 09:31
@zucchini-nlp zucchini-nlp merged commit 1a5be2f into huggingface:main May 27, 2025
10 checks passed
@DarkLight1337
Copy link

DarkLight1337 commented Jun 2, 2025

Looks like this PR didn't make it into the release :(

@zucchini-nlp
Copy link
Member Author

Hmm, cc @ArthurZucker. On main branch it is fine, since I merged last Friday the last PR for vLLM. The last patch isn't though 🥲

@ArthurZucker
Copy link
Collaborator

It did but the diff was empty

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
for patch Tag issues / labels that should be included in the next patch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to pass images to Aya Vision processor
4 participants