Skip to content

[kv_offload+HMA][1/N]: Worker-side support for multiple HMA groups#34680

Open
orozery wants to merge 1 commit intovllm-project:mainfrom
orozery:cpu-offloading-hma-worker
Open

[kv_offload+HMA][1/N]: Worker-side support for multiple HMA groups#34680
orozery wants to merge 1 commit intovllm-project:mainfrom
orozery:cpu-offloading-hma-worker

Conversation

@orozery
Copy link
Collaborator

@orozery orozery commented Feb 17, 2026

This PR enables the kv_offload worker to load KV data for multiple KV cache groups.
It still gets a list of block IDs to load, where the block IDs are ordered by group group0_block_ids...group1_block_ids... We add an additional field group_sizes which encode the sizes of each group of block IDs. This allows the worker to determine positions where a new group of block IDs start. This is needed in order to check whether respective data from the offloaded medium (e.g. CPU) should be skipped, in the case where offloaded_block_size > logical_block_size (gpu_block_size).

@orozery orozery requested review from NickLucche and njhill February 17, 2026 08:21
@orozery orozery requested a review from ApostaC as a code owner February 17, 2026 08:21
@mergify mergify bot added the v1 label Feb 17, 2026
This commit enables the kv_offload worker to load KV data
for multiple KV cache groups.
It still gets a list of block IDs to load, where the block IDs
are ordered by group group0_block_ids...group1_block_ids...
We add an additional field group_sizes which encode the sizes of each group of block IDs.
This allows the worker to determine positions where a new group of block IDs start.
This is needed in order to check whether respective data from the offloaded medium (e.g. CPU)
should be skipped, in the case where offloaded_block_size > logical_block_size (gpu_block_size).

Signed-off-by: Or Ozeri <oro@il.ibm.com>
@orozery orozery force-pushed the cpu-offloading-hma-worker branch from 8ea97a9 to 13fc036 Compare February 17, 2026 08:24
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request successfully introduces support for multiple KV cache groups in the kv_offload worker. This is achieved by adding a group_sizes field to the load/store specifications, allowing the worker to correctly handle unaligned transfers (where offloaded block size exceeds logical block size) for each group independently. The changes include updates to the metadata classes, the core offloading handler logic, and comprehensive test coverage. The implementation maintains backward compatibility for single-group transfers and includes robust assertions to ensure data integrity during the transfer process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant