Skip to content

[Llama4]: Add support for padding num_patches #486

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

vbaddi
Copy link
Contributor

@vbaddi vbaddi commented Jul 1, 2025

Summary

  • This PR enhances the robustness of the KV offload generation pipeline by adding pixel values padding for Vision-Language Models. The implementation includes approach that works across different VLM architectures. Currently exposed to Llama4.

Problem Statement

  • During KV offload generation in modeling_auto.py, some VLM models require specific pixel values tensor shapes that may not match the input data. Previously, this could cause runtime errors or suboptimal performance when the input pixel values didn't match the expected patch count for the compiled model.

Solution

  • Model-Specific Patch Count Method: Added get_expected_patch_count() method to VLM model classes that returns the expected number of patches as an integer: (17 in case of Llama4)

…PATCHES

Signed-off-by: vbaddi <quic_vbaddi@quicinc.com>
@vbaddi vbaddi self-assigned this Jul 1, 2025
@vbaddi vbaddi added enhancement New feature or request 1.20.0 labels Jul 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.20.0 enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant