Skip to content

Commit 546b3ea

Browse files
DarkLight1337Isotr0py
authored andcommitted
[Doc] [1/N] Reorganize Getting Started section (vllm-project#11645)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Isotr0py <2037008807@qq.com>
1 parent c66ce85 commit 546b3ea

22 files changed

+54
-41
lines changed

docs/source/design/arch_overview.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -77,8 +77,7 @@ python -m vllm.entrypoints.openai.api_server --model <model>
7777

7878
That code can be found in <gh-file:vllm/entrypoints/openai/api_server.py>.
7979

80-
More details on the API server can be found in the {doc}`OpenAI Compatible
81-
Server </serving/openai_compatible_server>` document.
80+
More details on the API server can be found in the [OpenAI-Compatible Server](#openai-compatible-server) document.
8281

8382
## LLM Engine
8483

docs/source/design/multiprocessing.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## Debugging
44

5-
Please see the [Debugging Tips](#debugging-python-multiprocessing)
5+
Please see the [Troubleshooting](#troubleshooting-python-multiprocessing)
66
page for information on known issues and how to solve them.
77

88
## Introduction
File renamed without changes.

docs/source/getting_started/arm-installation.md renamed to docs/source/getting_started/installation/cpu-arm.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
# Installation for ARM CPUs
44

5-
vLLM has been adapted to work on ARM64 CPUs with NEON support, leveraging the CPU backend initially developed for the x86 platform. This guide provides installation instructions specific to ARM. For additional details on supported features, refer to the x86 platform documentation covering:
5+
vLLM has been adapted to work on ARM64 CPUs with NEON support, leveraging the CPU backend initially developed for the x86 platform. This guide provides installation instructions specific to ARM. For additional details on supported features, refer to the [x86 CPU documentation](#installation-x86) covering:
66

77
- CPU backend inference capabilities
88
- Relevant runtime environment variables

docs/source/getting_started/cpu-installation.md renamed to docs/source/getting_started/installation/cpu-x86.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
(installation-cpu)=
1+
(installation-x86)=
22

3-
# Installation with CPU
3+
# Installation for x86 CPUs
44

55
vLLM initially supports basic model inferencing and serving on x86 CPU platform, with data types FP32, FP16 and BF16. vLLM CPU backend supports the following vLLM features:
66

@@ -151,4 +151,4 @@ $ python examples/offline_inference.py
151151
$ VLLM_CPU_KVCACHE_SPACE=40 VLLM_CPU_OMP_THREADS_BIND="0-31|32-63" vllm serve meta-llama/Llama-2-7b-chat-hf -tp=2 --distributed-executor-backend mp
152152
```
153153

154-
- Using Data Parallel for maximum throughput: to launch an LLM serving endpoint on each NUMA node along with one additional load balancer to dispatch the requests to those endpoints. Common solutions like [Nginx](../serving/deploying_with_nginx.md) or HAProxy are recommended. Anyscale Ray project provides the feature on LLM [serving](https://docs.ray.io/en/latest/serve/index.html). Here is the example to setup a scalable LLM serving with [Ray Serve](https://github.com/intel/llm-on-ray/blob/main/docs/setup.md).
154+
- Using Data Parallel for maximum throughput: to launch an LLM serving endpoint on each NUMA node along with one additional load balancer to dispatch the requests to those endpoints. Common solutions like [Nginx](#nginxloadbalancer) or HAProxy are recommended. Anyscale Ray project provides the feature on LLM [serving](https://docs.ray.io/en/latest/serve/index.html). Here is the example to setup a scalable LLM serving with [Ray Serve](https://github.com/intel/llm-on-ray/blob/main/docs/setup.md).

docs/source/getting_started/installation.md renamed to docs/source/getting_started/installation/gpu-cuda.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
(installation)=
1+
(installation-cuda)=
22

3-
# Installation
3+
# Installation for CUDA
44

55
vLLM is a Python library that also contains pre-compiled C++ and CUDA (12.1) binaries.
66

docs/source/getting_started/amd-installation.md renamed to docs/source/getting_started/installation/gpu-rocm.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
(installation-rocm)=
22

3-
# Installation with ROCm
3+
# Installation for ROCm
44

55
vLLM supports AMD GPUs with ROCm 6.2.
66

docs/source/getting_started/gaudi-installation.md renamed to docs/source/getting_started/installation/hpu-gaudi.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
1-
# Installation with Intel® Gaudi® AI Accelerators
1+
(installation-gaudi)=
2+
3+
# Installation for Intel® Gaudi®
24

35
This README provides instructions on running vLLM with Intel Gaudi devices.
46

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
(installation-index)=
2+
3+
# Installation
4+
5+
vLLM supports the following hardware platforms:
6+
7+
```{toctree}
8+
:maxdepth: 1
9+
10+
gpu-cuda
11+
gpu-rocm
12+
cpu-x86
13+
cpu-arm
14+
hpu-gaudi
15+
tpu
16+
xpu
17+
openvino
18+
neuron
19+
```

docs/source/getting_started/neuron-installation.md renamed to docs/source/getting_started/installation/neuron.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
(installation-neuron)=
22

3-
# Installation with Neuron
3+
# Installation for Neuron
44

55
vLLM 0.3.3 onwards supports model inferencing and serving on AWS Trainium/Inferentia with Neuron SDK with continuous batching.
66
Paged Attention and Chunked Prefill are currently in development and will be available soon.

0 commit comments

Comments
 (0)