enable static cache on vision encoder decoder #39773

jiqing-feng · 2025-07-30T06:49:37Z

After the issue #39746 fixed, the vision encoder decoder model can support static cache and can got more significant speed-up

import time
import requests
import torch
import PIL.Image
from transformers import pipeline

model_id = "nlpconnect/vit-gpt2-image-captioning"
image_to_text = pipeline("image-to-text", model=model_id, device="cpu", torch_dtype=torch.float16)
image_url = "https://ankur3107.github.io/assets/images/image-captioning-example.png"
image = PIL.Image.open(requests.get(image_url, stream=True, timeout=3000).raw)
generation_config = image_to_text.model.generation_config
generation_config.cache_implementation = "static"

for _ in range(10):
    output = image_to_text(image, generate_kwargs={"generation_config": generation_config})

start = time.time()
output = image_to_text(image, generate_kwargs={"generation_config": generation_config})
end = time.time()
print(f"eager mode pipeline latency {end - start}")

image_to_text.model.forward = torch.compile(image_to_text.model.forward)

for _ in range(10):
    output = image_to_text(image, generate_kwargs={"generation_config": generation_config})

start = time.time()
output = image_to_text(image, generate_kwargs={"generation_config": generation_config})
end = time.time()
print(f"compile mode pipeline latency {end - start}")

jiqing-feng · 2025-07-30T06:50:03Z

Hi @zucchini-nlp , could you please review this change? Thanks!

github-actions · 2025-07-30T06:50:38Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: vision_encoder_decoder

jiqing-feng · 2025-07-30T06:52:37Z

run-slow: vision_encoder_decoder

zucchini-nlp

Yeah, makes sense since the model can support compile as long as the LM supports it. Usually we need these to be class attributes, so we can check before init the class. For vision encoder decoder though we can't do it yet, so it's fine

Thanks!

HuggingFaceDocBuilderDev · 2025-07-30T08:11:17Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

jiqing-feng marked this pull request as ready for review July 30, 2025 06:49

github-actions bot requested a review from Rocketknight1 July 30, 2025 06:50

zucchini-nlp approved these changes Jul 30, 2025

View reviewed changes

zucchini-nlp enabled auto-merge (squash) July 30, 2025 07:59

zucchini-nlp merged commit 8ab21be into huggingface:main Jul 30, 2025
20 checks passed

enable static cache on vision encoder decoder

b6ffb4f

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

enable static cache on vision encoder decoder #39773

enable static cache on vision encoder decoder #39773

Uh oh!

jiqing-feng commented Jul 30, 2025

Uh oh!

jiqing-feng commented Jul 30, 2025

Uh oh!

github-actions bot commented Jul 30, 2025

Uh oh!

jiqing-feng commented Jul 30, 2025

Uh oh!

zucchini-nlp left a comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jul 30, 2025

Uh oh!

Uh oh!

enable static cache on vision encoder decoder #39773

enable static cache on vision encoder decoder #39773

Uh oh!

Conversation

jiqing-feng commented Jul 30, 2025

Uh oh!

jiqing-feng commented Jul 30, 2025

Uh oh!

github-actions bot commented Jul 30, 2025

Uh oh!

jiqing-feng commented Jul 30, 2025

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jul 30, 2025

Uh oh!

Uh oh!