Skip to content

QVAC-19119 test(llm-llamacpp): overlay qvac-fabric for Qwen3VL multi-tile batching#2515

Open
yingying0906 wants to merge 1 commit into
tetherto:mainfrom
yingying0906:test/qwen3vl-multitile-batching
Open

QVAC-19119 test(llm-llamacpp): overlay qvac-fabric for Qwen3VL multi-tile batching#2515
yingying0906 wants to merge 1 commit into
tetherto:mainfrom
yingying0906:test/qwen3vl-multitile-batching

Conversation

@yingying0906

@yingying0906 yingying0906 commented Jun 10, 2026

Copy link
Copy Markdown

What problem does this PR solve?

This PR is for CI testing only and will not be merged.

  1. Qwen3.5 lacked multi-tile support. Images were always processed as a single tile regardless of resolution, losing detail on high-resolution inputs.
  2. Even where multi-tiling existed, the vision encoder ran N sequential forward passes, one per tile. On GPU backends this is inefficient; a single batched dispatch over all tiles produces identical embeddings with far less overhead.

How does it solve it?

Implements multi-tile image preprocessing for Qwen3.5 (using the qwen3vl encoder path it shares with Qwen3VL), and replaces the N sequential vision encoder forward passes with a single batched forward pass over all tiles (batch dim = N tiles).

A new --image-tile-mode flag controls the dispatch mode:

Mode Behaviour
batched Single GPU forward pass over all tiles
sequential Original N sequential passes (for benchmarking)
baseline Single-tile dyn_size path, matches current master behaviour (default)

Preliminary benchmark (Qwen3.5-2B Q4_K_M, F16 mmproj, 1920x1280 image, 4 tiles):

MacBook (M5 Pro, Metal):

Variant Encode (ms) Prompt eval (ms) Decode (tok/s)
Baseline (master) 3,503 5,104 63.6
Sequential 1,723 3,258 62.4
Batched 1,762 3,297 63.0

Samsung S25 Ultra (Adreno 830, OpenCL):

Variant Encode (ms) Prompt eval (ms / tok) Prompt eval (tok/s)
Baseline (master) 57,288 199,552 / 2418 tok 12.1
Sequential 24,991 139,200 / 2322 tok 16.7
Batched 15,865 117,811 / 2322 tok 19.7

Breaking changes

None. To revert: delete packages/llm-llamacpp/vcpkg/ports/qvac-fabric/ and remove the VCPKG_OVERLAY_PORTS line from CMakeLists.txt.

🤖 Generated with Claude Code

…ulti-tile-batching

Points qvac-fabric to Rita's fork commit c463240a for testing the
Qwen3VL multi-tile batching feature (--image-tile-mode flag).

Remove vcpkg/ports/qvac-fabric/ to restore the registry version.
@yingying0906 yingying0906 requested review from a team as code owners June 10, 2026 09:14
@yingying0906 yingying0906 reopened this Jun 10, 2026
@yingying0906 yingying0906 changed the title test(llm-llamacpp): overlay qvac-fabric for Qwen3VL multi-tile batching QVAC-19119 test(llm-llamacpp): overlay qvac-fabric for Qwen3VL multi-tile batching Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant