Skip to content

Commit 2426221

Browse files
mohiso22Mohit Soni
authored andcommitted
Modeling fix (quic#605)
Signed-off-by: Mohit Soni <mohisoni@qti.qualcom.com> Co-authored-by: Mohit Soni <mohisoni@qti.qualcom.com> Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>
1 parent 1767668 commit 2426221

File tree

2 files changed

+3
-0
lines changed

2 files changed

+3
-0
lines changed

QEfficient/transformers/models/modeling_auto.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1412,6 +1412,8 @@ def kv_offload_generate(
14121412
if x.startswith("past_") or x.endswith("_RetainedState")
14131413
]
14141414
)
1415+
if not_mllama:
1416+
lang_session.skip_buffers(vision_outputs.keys())
14151417

14161418
# Get first token
14171419
lang_inputs["input_ids"] = outputs["logits"].argmax(2)

QEfficient/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -953,6 +953,7 @@ def smart_resize(
953953
grid_height = grid_h * grid_w
954954
grid_width = patch_size * patch_size * temporal_patch_size * channel
955955
vision_size = grid_height // 4
956+
vision_size = vision_size * num_frames
956957
grid_height = grid_height * batch_size
957958

958959
vision = [

0 commit comments

Comments
 (0)