forked from vllm-project/vllm
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Hardware][Intel GPU] Add Intel GPU(XPU) inference backend (vllm-proj…
…ect#3814) Co-authored-by: Jiang Li <jiang1.li@intel.com> Co-authored-by: Abhilash Majumder <abhilash.majumder@intel.com> Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
- Loading branch information
1 parent
1f12122
commit 728c4c8
Showing
31 changed files
with
1,998 additions
and
24 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
# This script build the CPU docker image and run the offline inference inside the container. | ||
# It serves a sanity check for compilation and basic model usage. | ||
set -ex | ||
|
||
# Try building the docker image | ||
docker build -t xpu-test -f Dockerfile.xpu . | ||
|
||
# Setup cleanup | ||
remove_docker_container() { docker rm -f xpu-test || true; } | ||
trap remove_docker_container EXIT | ||
remove_docker_container | ||
|
||
# Run the image and launch offline inference | ||
docker run --network host --name xpu-test --device /dev/dri -v /dev/dri/by-path:/dev/dri/by-path xpu-test python3 examples/offline_inference.py |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
FROM intel/oneapi-basekit:2024.1.0-devel-ubuntu22.04 | ||
|
||
RUN wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | tee /usr/share/keyrings/intel-oneapi-archive-keyring.gpg > /dev/null && \ | ||
echo "deb [signed-by=/usr/share/keyrings/intel-oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main " | tee /etc/apt/sources.list.d/oneAPI.list && \ | ||
chmod 644 /usr/share/keyrings/intel-oneapi-archive-keyring.gpg && \ | ||
rm /etc/apt/sources.list.d/intel-graphics.list && \ | ||
wget -O- https://repositories.intel.com/graphics/intel-graphics.key | gpg --dearmor | tee /usr/share/keyrings/intel-graphics.gpg > /dev/null && \ | ||
echo "deb [arch=amd64,i386 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/graphics/ubuntu jammy arc" | tee /etc/apt/sources.list.d/intel.gpu.jammy.list && \ | ||
chmod 644 /usr/share/keyrings/intel-graphics.gpg | ||
|
||
RUN apt-get update -y \ | ||
&& apt-get install -y curl libicu70 lsb-release git wget vim numactl python3 python3-pip | ||
|
||
COPY ./ /workspace/vllm | ||
|
||
WORKDIR /workspace/vllm | ||
|
||
RUN pip install -v -r requirements-xpu.txt | ||
|
||
RUN VLLM_TARGET_DEVICE=xpu python3 setup.py install | ||
|
||
CMD ["/bin/bash"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
.. _installation_xpu: | ||
|
||
Installation with XPU | ||
======================== | ||
|
||
vLLM initially supports basic model inferencing and serving on Intel GPU platform. | ||
|
||
Table of contents: | ||
|
||
#. :ref:`Requirements <xpu_backend_requirements>` | ||
#. :ref:`Quick start using Dockerfile <xpu_backend_quick_start_dockerfile>` | ||
#. :ref:`Build from source <build_xpu_backend_from_source>` | ||
|
||
.. _xpu_backend_requirements: | ||
|
||
Requirements | ||
------------ | ||
|
||
* OS: Linux | ||
* Supported Hardware: Intel Data Center GPU (Intel ARC GPU WIP) | ||
* OneAPI requirements: oneAPI 2024.1 | ||
|
||
.. _xpu_backend_quick_start_dockerfile: | ||
|
||
Quick start using Dockerfile | ||
---------------------------- | ||
|
||
.. code-block:: console | ||
$ docker build -f Dockerfile.xpu -t vllm-xpu-env --shm-size=4g . | ||
$ docker run -it \ | ||
--rm \ | ||
--network=host \ | ||
--device /dev/dri \ | ||
-v /dev/dri/by-path:/dev/dri/by-path \ | ||
vllm-xpu-env | ||
.. _build_xpu_backend_from_source: | ||
|
||
Build from source | ||
----------------- | ||
|
||
- First, install required driver and intel OneAPI 2024.1. | ||
|
||
- Second, install Python packages for vLLM XPU backend building: | ||
|
||
.. code-block:: console | ||
$ pip install --upgrade pip | ||
$ pip install -v -r requirements-xpu.txt | ||
- Finally, build and install vLLM XPU backend: | ||
|
||
.. code-block:: console | ||
$ VLLM_TARGET_DEVICE=xpu python setup.py install | ||
.. note:: | ||
- FP16 is the default data type in the current XPU backend. The BF16 data | ||
type will be supported in the future. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
# Common dependencies | ||
-r requirements-common.txt | ||
|
||
setuptools < 70.0.0 # IPEX's torch have some dependency. to be removed. | ||
|
||
torch @ https://intel-extension-for-pytorch.s3.amazonaws.com/ipex_dev/xpu/torch-2.1.0.post1%2Bcxx11.abi-cp310-cp310-linux_x86_64.whl | ||
intel_extension_for_pytorch @ https://intel-extension-for-pytorch.s3.amazonaws.com/ipex_dev/xpu/intel_extension_for_pytorch-2.1.30a0-cp310-cp310-linux_x86_64.whl | ||
oneccl_bind_pt @ https://intel-extension-for-pytorch.s3.amazonaws.com/ipex_stable/xpu/oneccl_bind_pt-2.1.200%2Bxpu-cp310-cp310-linux_x86_64.whl | ||
|
||
triton @ https://github.com/intel/intel-xpu-backend-for-triton/releases/download/v2.1.0/triton-2.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.