- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 10.9k
[v1] Move block_hashes from KVCacheManager to Request.block_hashes (#19728) #19728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! | 
| 👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run  Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add  🚀 | 
| This pull request has merge conflicts that must be resolved before it can be | 
457540f    to
    a34e6ac      
    Compare
  
    | This pull request has merge conflicts that must be resolved before it can be | 
a34e6ac    to
    15001b4      
    Compare
  
    | This pull request has merge conflicts that must be resolved before it can be | 
15001b4    to
    914876e      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks reasonable to me but would be good to get approval from @heheda12345
and @WoosukKwon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My major concern is that the block_hash list is updated here 
vllm/vllm/v1/core/block_pool.py
Line 177 in d3f05c9
| block_hashes.append(block_hash) | 
It is quite far from
Request class. It will make the number of element in the list magic to connectors, and hurt the modularity of the code.Therefore, if you want to put it into
Request class, can you move this logic from "update when block hash is needed" to "update when token ids of the request are updated"?
      
    | 
 Today,  I'm trying to interpret your suggestion "update when token ids of the request are updated". | 
914876e    to
    234f334      
    Compare
  
    | Yes I mean  Lines 223 to 240 in 72d14d0 
 | 
| 
 @heheda12345 Consider the following scenario: 
 So to sum-up, because of the delay in setting of  | 
| But I think the hash of the prompt can be initialized when the request is created | 
| @orozery could you rebase on latest main? Should hopefully help with at least some of the CI failures. | 
Head branch was pushed to by a user without write access
0241ee6    to
    564a359      
    Compare
  
    | 
 done | 
Head branch was pushed to by a user without write access
b3e1288    to
    e440daf      
    Compare
  
    | Can you rebase with main again? It should fix the test_eagle_correctness failure. | 
This commit move the request block hashes from the KVCacheManager to the Request object itself. In particular, this will allow connectors to access the request block hashes. Signed-off-by: Or Ozeri <oro@il.ibm.com>
e440daf    to
    3d0762a      
    Compare
  
    …llm-project#19728) Signed-off-by: Or Ozeri <oro@il.ibm.com> Signed-off-by: Yiwen Chen <yiwen66@berkeley.edu>
…llm-project#19728) Signed-off-by: Or Ozeri <oro@il.ibm.com>
…llm-project#19728) Signed-off-by: Or Ozeri <oro@il.ibm.com>
…llm-project#19728) Signed-off-by: Or Ozeri <oro@il.ibm.com> Signed-off-by: Duncan Moss <djm.moss@gmail.com>
…llm-project#19728) Signed-off-by: Or Ozeri <oro@il.ibm.com>
…llm-project#19728) Signed-off-by: Or Ozeri <oro@il.ibm.com> Signed-off-by: Xiao Yu <xiao.yu@amd.com>
…llm-project#19728) Signed-off-by: Or Ozeri <oro@il.ibm.com>
This PR move the request block hashes from the KVCacheManager to the Request object itself.
In particular, this will allow connectors to access the request block hashes.
Part of the work described in RFC #19854