Skip to content

Update vllm to use latest upstream to support CPU #179

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Apr 28, 2024

Conversation

xwu-intel
Copy link

Signed-off-by: Wu, Xiaochang xiaochang.wu@intel.com

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
@xwu-intel xwu-intel linked an issue Apr 8, 2024 that may be closed by this pull request
@xwu-intel xwu-intel requested a review from carsonwang April 8, 2024 04:15
@xwu-intel xwu-intel self-assigned this Apr 8, 2024
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
@carsonwang
Copy link
Contributor

Can we directly install vllm from pip as there is a v0.4.0.post1 release now?

@xwu-intel
Copy link
Author

let me enable PR, it seems like the code is not in the watch list.

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
@xwu-intel xwu-intel requested a review from jiafuzha April 8, 2024 06:17
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
@carsonwang
Copy link
Contributor

"vllm 0.4.0.post1+cpu requires torch==2.1.2+cpu, but you have torch 2.2.2+cpu which is incompatible." Please use a specific commit on vllm master branch which has upgraded to 2.2.1+cpu.

@xwu-intel
Copy link
Author

"vllm 0.4.0.post1+cpu requires torch==2.1.2+cpu, but you have torch 2.2.2+cpu which is incompatible." Please use a specific commit on vllm master branch which has upgraded to 2.2.1+cpu.

I don't think there is such commit. We need to fix torch version requirements.

@carsonwang
Copy link
Contributor

"vllm 0.4.0.post1+cpu requires torch==2.1.2+cpu, but you have torch 2.2.2+cpu which is incompatible." Please use a specific commit on vllm master branch which has upgraded to 2.2.1+cpu.

I don't think there is such commit. We need to fix torch version requirements.

You can try the current vllm main branch to see if it can pass CI. If so, just use the current commit.

@carsonwang
Copy link
Contributor

our requirement is torch>=2.2.0 so it should be compatible with vLLM's requirement (=2.2.1+cpu)

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
@xwu-intel
Copy link
Author

our requirement is torch>=2.2.0 so it should be compatible with vLLM's requirement (=2.2.1+cpu)

Updated version. but the "RuntimeError: Not support device type: cpu" is from vllm cpu, I am reporting it.

@xwu-intel xwu-intel changed the title Update vllm to use upstream v0.4.0.post1 Update vllm to use latest upstream to support CPU Apr 16, 2024
@carsonwang
Copy link
Contributor

Can you please update the user document how to use vllm as backend?

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
…e CPU key-value cache

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
@xwu-intel
Copy link
Author

xwu-intel commented Apr 28, 2024

Can you please update the user document how to use vllm as backend?

Nothing to update. Still follow docs/vllm.md as the setup is thru an install script.

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
@xwu-intel xwu-intel requested a review from carsonwang April 28, 2024 04:37
@xwu-intel
Copy link
Author

@carsonwang CI passed. Use huggingface-cli login to access gated llama2 model.
We can change to use a saved .cache/huggingface/token file that no need to login every time.

@carsonwang carsonwang merged commit 4e81eb2 into intel:main Apr 28, 2024
23 checks passed
harborn pushed a commit to harborn/llm-on-ray that referenced this pull request May 8, 2024
* update vllm to use upstream v0.4.0.post1

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

* nit

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

* adjust watch list

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

* Add llm_on_ray package installation and set CPU key-value cache size

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

* Remove device=infer_conf.device and add comment explaining why

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

* Update VLLM installation script to use main commit

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

* Update GCC version detection in install-vllm-cpu.sh script

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

* Update vllm-cpu installation method

* Fix Docker build command and update YAML configuration files

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

* Add VLLM_CPU_KVCACHE_SPACE_DEFAULT constant to control the size of the CPU key-value cache

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

* update

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

* nit

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

* Update default value of VLLM_CPU_KVCACHE_SPACE to 40GB

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

* Fix indentation in workflow_inference.yml

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

* debug

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

* debug

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

* debug

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

* nit

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

* nit

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

* debug

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

* Enable non-gated and gated models access

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

---------

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Install vLLM from upstream as CPU backend was merged.
2 participants