-
Notifications
You must be signed in to change notification settings - Fork 33
Update vllm to use latest upstream to support CPU #179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Can we directly install vllm from pip as there is a v0.4.0.post1 release now? |
let me enable PR, it seems like the code is not in the watch list. |
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
"vllm 0.4.0.post1+cpu requires torch==2.1.2+cpu, but you have torch 2.2.2+cpu which is incompatible." Please use a specific commit on vllm master branch which has upgraded to 2.2.1+cpu. |
I don't think there is such commit. We need to fix torch version requirements. |
You can try the current vllm main branch to see if it can pass CI. If so, just use the current commit. |
our requirement is torch>=2.2.0 so it should be compatible with vLLM's requirement (=2.2.1+cpu) |
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Updated version. but the "RuntimeError: Not support device type: cpu" is from vllm cpu, I am reporting it. |
Can you please update the user document how to use vllm as backend? |
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
…e CPU key-value cache Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Nothing to update. Still follow docs/vllm.md as the setup is thru an install script. |
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
@carsonwang CI passed. Use huggingface-cli login to access gated llama2 model. |
* update vllm to use upstream v0.4.0.post1 Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * nit Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * adjust watch list Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * Add llm_on_ray package installation and set CPU key-value cache size Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * Remove device=infer_conf.device and add comment explaining why Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * Update VLLM installation script to use main commit Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * Update GCC version detection in install-vllm-cpu.sh script Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * Update vllm-cpu installation method * Fix Docker build command and update YAML configuration files Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * Add VLLM_CPU_KVCACHE_SPACE_DEFAULT constant to control the size of the CPU key-value cache Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * update Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * nit Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * Update default value of VLLM_CPU_KVCACHE_SPACE to 40GB Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * Fix indentation in workflow_inference.yml Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * debug Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * debug Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * debug Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * nit Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * nit Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * debug Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * Enable non-gated and gated models access Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> --------- Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Signed-off-by: Wu, Xiaochang xiaochang.wu@intel.com