Skip to content

Commit

Permalink
adding toc
Browse files Browse the repository at this point in the history
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
  • Loading branch information
zhouyuan committed Oct 8, 2024
1 parent a31bfae commit acb0913
Show file tree
Hide file tree
Showing 2 changed files with 29 additions and 1 deletion.
7 changes: 6 additions & 1 deletion docs/source/getting_started/cpu-installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,12 @@
Installation with CPU
========================

vLLM initially supports basic model inferencing and serving on x86 CPU platform, with data types FP32 and BF16.
vLLM initially supports basic model inferencing and serving on x86 CPU platform, with data types FP32 and BF16. vLLM CPU backend supports the following vLLM features:

- Tensor Parallel (``--tensor-paralel = x``)
- Quantization (``W8A8, AWQ``)

More advanced features on `chunked-prefill`, `FP8 KV cache` are under development and will be available soon.

Table of contents:

Expand Down
23 changes: 23 additions & 0 deletions docs/source/getting_started/nginx-loadbalancer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,16 @@ Nginx Loadbalancer

This document shows how to launch multiple vLLM serving containers and use Nginx to act as a load balancer between the servers.

Table of contents:

#. :ref:`Build Nginx Container <nginxloadbalancer_nginx_build>`
#. :ref:`Create Simple Nginx Config file <nginxloadbalancer_nginx_conf>`
#. :ref:`Build vLLM Container <nginxloadbalancer_nginx_vllm_container>`
#. :ref:`Create Docker Network <nginxloadbalancer_nginx_docker_network>`
#. :ref:`Launch vLLM Containers <nginxloadbalancer_nginx_launch_container>`
#. :ref:`Launch Nginx <nginxloadbalancer_nginx_launch_nginx>`
#. :ref:`Verify That vLLM Servers Are Ready <nginxloadbalancer_nginx_verify_nginx>`

.. _nginxloadbalancer_nginx_build:

Build Nginx Container
Expand Down Expand Up @@ -34,6 +44,8 @@ Build the container:
docker build . -f Dockerfile.nginx --tag nginx-lb
.. _nginxloadbalancer_nginx_conf:

Create Simple Nginx Config file
-------------------------------

Expand All @@ -57,6 +69,8 @@ Create a file named ``nginx_conf/nginx.conf``. Note that you can add as many ser
}
}
.. _nginxloadbalancer_nginx_vllm_container:

Build vLLM Container
--------------------

Expand All @@ -71,13 +85,18 @@ Notes:
sed -i "s|ENTRYPOINT \[\"python3\", \"-m\", \"vllm.entrypoints.openai.api_server\"\]|ENTRYPOINT [\"python3\", \"-m\", \"vllm.entrypoints.openai.api_server\", \"--model\", \"$model\"]|" Dockerfile.cpu
docker build -f Dockerfile.cpu . --tag vllm --build-arg http_proxy=$http_proxy --build-arg https_proxy=$https_proxy
.. _nginxloadbalancer_nginx_docker_network:

Create Docker Network
---------------------

.. code-block:: console
docker network create vllm_nginx
.. _nginxloadbalancer_nginx_launch_container:

Launch vLLM Containers
----------------------

Expand All @@ -96,13 +115,17 @@ Notes:
docker run -itd --ipc host --privileged --network vllm_nginx --cap-add=SYS_ADMIN --shm-size=10.24gb -e VLLM_CPU_KVCACHE_SPACE=40 -e VLLM_CPU_OMP_THREADS_BIND=$SVR_0_CORES -e http_proxy=$http_proxy -e https_proxy=$https_proxy -v $hf_cache_dir:/root/.cache/huggingface/ -p 8081:8000 --name vllm0 vllm
docker run -itd --ipc host --privileged --network vllm_nginx --cap-add=SYS_ADMIN --shm-size=10.24gb -e VLLM_CPU_KVCACHE_SPACE=40 -e VLLM_CPU_OMP_THREADS_BIND=$SVR_1_CORES -e http_proxy=$http_proxy -e https_proxy=$https_proxy -v $hf_cache_dir:/root/.cache/huggingface/ -p 8082:8000 --name vllm1 vllm
.. _nginxloadbalancer_nginx_launch_nginx:

Launch Nginx
------------

.. code-block:: console
docker run -itd -p 8000:80 --network vllm_nginx -v ./nginx_conf/:/etc/nginx/conf.d/ --name nginx-lb nginx-lb:latest
.. _nginxloadbalancer_nginx_verify_nginx:

Verify That vLLM Servers Are Ready
----------------------------------

Expand Down

0 comments on commit acb0913

Please sign in to comment.