Skip to content

[WIP] Hybrid Memory Allocator #16178

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 21 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 12 additions & 7 deletions vllm/v1/core/block_pool.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@
# enabled).
self.free_block_queue = FreeKVCacheBlockQueue(self.blocks)

# {block_hash: {block ID: block}}. A cached block is
# {block_hash: {group ID: {block ID: block}}}. A cached block is
# a full block with a block hash that can be used for prefix caching.
# The cached block may be used by running requests or in the
# free_block_queue that could potentially be evicted.
Expand All @@ -48,16 +48,19 @@
# if there is already an identical block in the cache. This is because
# we want to make sure the allocated block IDs won't change so that
# block tables are append-only.
self.cached_block_hash_to_block: dict[BlockHashType, dict[
int, KVCacheBlock]] = defaultdict(dict)
self.cached_block_hash_to_block: dict[BlockHashType, dict[int, dict[
int, KVCacheBlock]]] = defaultdict(dict)

# To represent a placeholder block with block_id=0.
# The ref_cnt of null_block is not maintained, needs special care to
# avoid freeing it.
self.null_block = self.free_block_queue.popleft()
self.null_block.is_null = True

def get_cached_block(self,
block_hash: BlockHashType) -> Optional[KVCacheBlock]:
def get_cached_block(
self,
block_hash: BlockHashType,
) -> Optional[dict[int, KVCacheBlock]]:
"""Get a cached block by the block hash, or None if cache miss.
If there are duplicated blocks, we return the first block in the cache.

Expand All @@ -70,8 +73,10 @@
cached_blocks = self.cached_block_hash_to_block.get(block_hash)
if not cached_blocks:
return None
first_block_id = next(iter(cached_blocks))
return cached_blocks[first_block_id]
return {
group_id: next(iter(blocks))

Check failure on line 77 in vllm/v1/core/block_pool.py

View workflow job for this annotation

GitHub Actions / pre-commit

Value expression in dictionary comprehension has incompatible type "int"; expected type "KVCacheBlock" [misc]
for group_id, blocks in cached_blocks.items() if blocks
}

def cache_full_blocks(
self,
Expand Down Expand Up @@ -152,7 +157,7 @@

# Update and added the full block to the cache.
blk.block_hash = block_hash
self.cached_block_hash_to_block[block_hash][blk.block_id] = blk

Check failure on line 160 in vllm/v1/core/block_pool.py

View workflow job for this annotation

GitHub Actions / pre-commit

Incompatible types in assignment (expression has type "KVCacheBlock", target has type "dict[int, KVCacheBlock]") [assignment]
prev_block_hash_value = block_hash.hash_value

def get_new_blocks(self, num_blocks: int) -> list[KVCacheBlock]:
Expand Down
Loading
Loading