[kv_offload+HMA][1/N]: Worker-side support for multiple HMA groups by orozery · Pull Request #34680 · vllm-project/vllm

orozery · 2026-02-17T08:21:17Z

This PR enables the kv_offload worker to load KV data for multiple KV cache groups.
It still gets a list of block IDs to load, where the block IDs are ordered by group group0_block_ids...group1_block_ids... We add an additional field group_sizes which encode the sizes of each group of block IDs. This allows the worker to determine positions where a new group of block IDs start. This is needed in order to check whether respective data from the offloaded medium (e.g. CPU) should be skipped, in the case where offloaded_block_size > logical_block_size (gpu_block_size).

This commit enables the kv_offload worker to load KV data for multiple KV cache groups. It still gets a list of block IDs to load, where the block IDs are ordered by group group0_block_ids...group1_block_ids... We add an additional field group_sizes which encode the sizes of each group of block IDs. This allows the worker to determine positions where a new group of block IDs start. This is needed in order to check whether respective data from the offloaded medium (e.g. CPU) should be skipped, in the case where offloaded_block_size > logical_block_size (gpu_block_size). Signed-off-by: Or Ozeri <oro@il.ibm.com>

gemini-code-assist

Code Review

The pull request successfully introduces support for multiple KV cache groups in the kv_offload worker. This is achieved by adding a group_sizes field to the load/store specifications, allowing the worker to correctly handle unaligned transfers (where offloaded block size exceeds logical block size) for each group independently. The changes include updates to the metadata classes, the core offloading handler logic, and comprehensive test coverage. The implementation maintains backward compatibility for single-group transfers and includes robust assertions to ensure data integrity during the transfer process.

orozery requested review from NickLucche and njhill February 17, 2026 08:21

orozery requested a review from ApostaC as a code owner February 17, 2026 08:21

mergify bot added the v1 label Feb 17, 2026

orozery force-pushed the cpu-offloading-hma-worker branch from 8ea97a9 to 13fc036 Compare February 17, 2026 08:24

gemini-code-assist bot reviewed Feb 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[kv_offload+HMA][1/N]: Worker-side support for multiple HMA groups#34680

[kv_offload+HMA][1/N]: Worker-side support for multiple HMA groups#34680
orozery wants to merge 1 commit intovllm-project:mainfrom
orozery:cpu-offloading-hma-worker

orozery commented Feb 17, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

orozery commented Feb 17, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

orozery commented Feb 17, 2026 •

edited by github-actions bot

Loading