-
-
Notifications
You must be signed in to change notification settings - Fork 8.4k
[WIP] Hybrid Memory Allocator #16178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
This implementation is quite simple and some designs inspire me a lot. But I think it may be too specialized. For the key differences you mentioned:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My two cents to the discussion points:
- What isn't clear to me in this PR is how will we initialize the hybrid memory allocator, specifically how will we pass arguments to
SingleMemoryAllocator
andFullAndSwaMemoryAllocator
given they expect different arguments. If we could unify the arguments ofHybridMemoryAllocator
, it seems to me that we are able to support both 2 and 6 managers easily, so that it doesn't matter whetherFullAndSwaMemoryAllocator
in this PR has 2 or 6 managers. - ooc, is the reason of not hashing group IDs only because of the complexity, or is there any other motivations?
- So the structure becomes
KVCacheManager.HybridMemoryAllocator
IIUC. It seems reasonable to me. One point worth to discuss is how people understand the scope of "manager" and "allocator". Intuitively, "allocator" is in charge of allocating blocks, but this is not the current allocator is doing. On the other hand, if we move all allocation logic to the allocator, then it seems unnecessary to introduce the allocator at all (just like Chen's PR). In short, if the current logic in the allocator is the only specialized logic for hybrid memory, then I'd prefer the solution in this PR (but with another name than the allocator); otherwise it might need more discussions.
vllm/v1/core/specialized_manager.py
Outdated
if all(group_id in cached_blocks for group_id in group_ids): | ||
computed_blocks.append(cached_blocks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: Probably need a fast path for the case of len(group_ids)==1
.
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
This pull request has merge conflicts that must be resolved before it can be |
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
Key differences from #16101: