[Llama4]: Add support for padding num_patches #486

vbaddi · 2025-07-01T10:30:17Z

Summary

This PR enhances the robustness of the KV offload generation pipeline by adding pixel values padding for Vision-Language Models. The implementation includes approach that works across different VLM architectures. Currently exposed to Llama4.

Problem Statement

During KV offload generation in modeling_auto.py, some VLM models require specific pixel values tensor shapes that may not match the input data. Previously, this could cause runtime errors or suboptimal performance when the input pixel values didn't match the expected patch count for the compiled model.

Solution

Model-Specific Patch Count Method: Added get_expected_patch_count() method to VLM model classes that returns the expected number of patches as an integer: (17 in case of Llama4)

…PATCHES Signed-off-by: vbaddi <quic_vbaddi@quicinc.com>

[Llama4]: Add support for padding pixel_value num_patches to MAX_NUM_…

b434b49

…PATCHES Signed-off-by: vbaddi <quic_vbaddi@quicinc.com>

vbaddi self-assigned this Jul 1, 2025

vbaddi requested review from quic-rishinr, ochougul, quic-hemagnih and quic-amitraj as code owners July 1, 2025 10:30

vbaddi added enhancement New feature or request 1.20.0 labels Jul 1, 2025

Provide feedback