Update vllm to use latest upstream to support CPU #179

xwu-intel · 2024-04-08T04:09:42Z

Signed-off-by: Wu, Xiaochang xiaochang.wu@intel.com

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

carsonwang · 2024-04-08T06:05:43Z

Can we directly install vllm from pip as there is a v0.4.0.post1 release now?

xwu-intel · 2024-04-08T06:08:58Z

let me enable PR, it seems like the code is not in the watch list.

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

carsonwang · 2024-04-09T01:14:42Z

"vllm 0.4.0.post1+cpu requires torch==2.1.2+cpu, but you have torch 2.2.2+cpu which is incompatible." Please use a specific commit on vllm master branch which has upgraded to 2.2.1+cpu.

xwu-intel · 2024-04-09T01:20:23Z

"vllm 0.4.0.post1+cpu requires torch==2.1.2+cpu, but you have torch 2.2.2+cpu which is incompatible." Please use a specific commit on vllm master branch which has upgraded to 2.2.1+cpu.

I don't think there is such commit. We need to fix torch version requirements.

carsonwang · 2024-04-09T01:54:08Z

"vllm 0.4.0.post1+cpu requires torch==2.1.2+cpu, but you have torch 2.2.2+cpu which is incompatible." Please use a specific commit on vllm master branch which has upgraded to 2.2.1+cpu.

I don't think there is such commit. We need to fix torch version requirements.

You can try the current vllm main branch to see if it can pass CI. If so, just use the current commit.

carsonwang · 2024-04-09T01:55:50Z

our requirement is torch>=2.2.0 so it should be compatible with vLLM's requirement (=2.2.1+cpu)

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

…e-vllm

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

xwu-intel · 2024-04-09T04:33:22Z

our requirement is torch>=2.2.0 so it should be compatible with vLLM's requirement (=2.2.1+cpu)

Updated version. but the "RuntimeError: Not support device type: cpu" is from vllm cpu, I am reporting it.

llm_on_ray/inference/vllm_predictor.py

carsonwang · 2024-04-28T01:30:22Z

Can you please update the user document how to use vllm as backend?

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

…e-vllm

…e CPU key-value cache Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

xwu-intel · 2024-04-28T02:07:57Z

Can you please update the user document how to use vllm as backend?

Nothing to update. Still follow docs/vllm.md as the setup is thru an install script.

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

llm_on_ray/inference/vllm_predictor.py

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

xwu-intel · 2024-04-28T04:39:45Z

@carsonwang CI passed. Use huggingface-cli login to access gated llama2 model.
We can change to use a saved .cache/huggingface/token file that no need to login every time.

* update vllm to use upstream v0.4.0.post1 Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * nit Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * adjust watch list Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * Add llm_on_ray package installation and set CPU key-value cache size Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * Remove device=infer_conf.device and add comment explaining why Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * Update VLLM installation script to use main commit Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * Update GCC version detection in install-vllm-cpu.sh script Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * Update vllm-cpu installation method * Fix Docker build command and update YAML configuration files Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * Add VLLM_CPU_KVCACHE_SPACE_DEFAULT constant to control the size of the CPU key-value cache Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * update Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * nit Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * Update default value of VLLM_CPU_KVCACHE_SPACE to 40GB Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * Fix indentation in workflow_inference.yml Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * debug Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * debug Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * debug Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * nit Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * nit Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * debug Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> * Enable non-gated and gated models access Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> --------- Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

update vllm to use upstream v0.4.0.post1

8fce284

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

xwu-intel linked an issue Apr 8, 2024 that may be closed by this pull request

Install vLLM from upstream as CPU backend was merged. #172

Closed

xwu-intel requested a review from carsonwang April 8, 2024 04:15

xwu-intel self-assigned this Apr 8, 2024

nit

7118577

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

adjust watch list

8100b74

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

xwu-intel requested a review from jiafuzha April 8, 2024 06:17

xwu-intel added 2 commits April 8, 2024 07:04

Add llm_on_ray package installation and set CPU key-value cache size

9a399ad

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

Remove device=infer_conf.device and add comment explaining why

e50e8fb

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

xwu-intel added 3 commits April 9, 2024 02:43

Update VLLM installation script to use main commit

9ee1eb7

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

Merge branch 'main' of https://github.com/intel/llm-on-ray into updat…

ad317d3

…e-vllm

Update GCC version detection in install-vllm-cpu.sh script

6a56e1a

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

xwu-intel changed the title ~~Update vllm to use upstream v0.4.0.post1~~ Update vllm to use latest upstream to support CPU Apr 16, 2024

Update vllm-cpu installation method

a983b2b

carsonwang reviewed Apr 28, 2024

View reviewed changes

llm_on_ray/inference/vllm_predictor.py Outdated Show resolved Hide resolved

xwu-intel added 3 commits April 28, 2024 01:48

Fix Docker build command and update YAML configuration files

1c653ca

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

Merge branch 'main' of https://github.com/intel/llm-on-ray into updat…

a64bbf2

…e-vllm

Add VLLM_CPU_KVCACHE_SPACE_DEFAULT constant to control the size of th…

870bed7

…e CPU key-value cache Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

xwu-intel added 3 commits April 28, 2024 02:08

update

df317f6

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

nit

f9c3945

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

Update default value of VLLM_CPU_KVCACHE_SPACE to 40GB

c227f98

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

carsonwang reviewed Apr 28, 2024

View reviewed changes

llm_on_ray/inference/vllm_predictor.py Show resolved Hide resolved

xwu-intel added 8 commits April 28, 2024 02:53

Fix indentation in workflow_inference.yml

5d0adc2

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

debug

e220f76

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

debug

1a8cb7c

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

debug

9b342bf

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

nit

ce59387

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

nit

cc5c75c

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

debug

dc13542

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

Enable non-gated and gated models access

96dcdc0

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

xwu-intel requested a review from carsonwang April 28, 2024 04:37

carsonwang approved these changes Apr 28, 2024

View reviewed changes

carsonwang merged commit 4e81eb2 into intel:main Apr 28, 2024
23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update vllm to use latest upstream to support CPU #179

Update vllm to use latest upstream to support CPU #179

xwu-intel commented Apr 8, 2024

carsonwang commented Apr 8, 2024

xwu-intel commented Apr 8, 2024

carsonwang commented Apr 9, 2024

xwu-intel commented Apr 9, 2024

carsonwang commented Apr 9, 2024

carsonwang commented Apr 9, 2024

xwu-intel commented Apr 9, 2024

carsonwang commented Apr 28, 2024

xwu-intel commented Apr 28, 2024 •

edited

Loading

xwu-intel commented Apr 28, 2024

Update vllm to use latest upstream to support CPU #179

Update vllm to use latest upstream to support CPU #179

Conversation

xwu-intel commented Apr 8, 2024

carsonwang commented Apr 8, 2024

xwu-intel commented Apr 8, 2024

carsonwang commented Apr 9, 2024

xwu-intel commented Apr 9, 2024

carsonwang commented Apr 9, 2024

carsonwang commented Apr 9, 2024

xwu-intel commented Apr 9, 2024

carsonwang commented Apr 28, 2024

xwu-intel commented Apr 28, 2024 • edited Loading

xwu-intel commented Apr 28, 2024

xwu-intel commented Apr 28, 2024 •

edited

Loading