- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 10.9k
[test] nothing #17653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[test] nothing #17653
Conversation
| 👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run  Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add  🚀 | 
| very nice! | 
        
          
                vllm/engine/arg_utils.py
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have a --cpu-offloading-gb for offloading model weights... we should use more specific naming for these args to disambiguate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chunxiaozheng I suggest to use --enable-kvcache-cpu-offloading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I have updated.
| This pull request has merge conflicts that must be resolved before it can be | 
        
          
                vllm/v1/worker/gpu_model_runner.py
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be "blocks_to_swap_out_buffer" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, @chunxiaozheng this maybe an accident typo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
| hello,i want to know , can this solution run on 1-cpu/n-gpu in one host situtation? | 
| This pull request has merge conflicts that must be resolved before it can be | 
12f1a3f    to
    f62cad6      
    Compare
  
    
kv cache