Skip to content

Commit f3fed6d

Browse files
hmellorrasmith
authored andcommitted
[Doc] Organise installation documentation into categories and tabs (vllm-project#11935)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
1 parent 4f1b748 commit f3fed6d

File tree

21 files changed

+1241
-392
lines changed

21 files changed

+1241
-392
lines changed

docs/source/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@
5656
# List of patterns, relative to source directory, that match files and
5757
# directories to ignore when looking for source files.
5858
# This pattern also affects html_static_path and html_extra_path.
59-
exclude_patterns: List[str] = ["**/*.template.md"]
59+
exclude_patterns: List[str] = ["**/*.template.md", "**/*.inc.md"]
6060

6161
# Exclude the prompt "$" when copying code
6262
copybutton_prompt_text = r"\$ "

docs/source/deployment/docker.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22

33
# Using Docker
44

5+
(deployment-docker-pre-built-image)=
6+
57
## Use vLLM's Official Docker Image
68

79
vLLM offers an official Docker image for deployment.
@@ -23,6 +25,8 @@ container to access the host's shared memory. vLLM uses PyTorch, which uses shar
2325
memory to share data between processes under the hood, particularly for tensor parallel inference.
2426
```
2527

28+
(deployment-docker-build-image-from-source)=
29+
2630
## Building vLLM's Docker Image from Source
2731

2832
You can build and run vLLM from source via the provided <gh-file:Dockerfile>. To build vLLM:

docs/source/features/compatibility_matrix.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -322,7 +322,9 @@ Check the '✗' with links to see tracking issue for unsupported feature/hardwar
322322
323323
```
324324

325-
### Feature x Hardware
325+
(feature-x-hardware)=
326+
327+
## Feature x Hardware
326328

327329
```{list-table}
328330
:header-rows: 1

docs/source/getting_started/installation/hpu-gaudi.md renamed to docs/source/getting_started/installation/ai_accelerator/hpu-gaudi.inc.md

Lines changed: 41 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,38 +1,23 @@
1-
(installation-gaudi)=
1+
# Installation
22

3-
# Installation for Intel® Gaudi®
3+
This tab provides instructions on running vLLM with Intel Gaudi devices.
44

5-
This README provides instructions on running vLLM with Intel Gaudi devices.
5+
## Requirements
66

7-
## Requirements and Installation
7+
- OS: Ubuntu 22.04 LTS
8+
- Python: 3.10
9+
- Intel Gaudi accelerator
10+
- Intel Gaudi software version 1.18.0
811

912
Please follow the instructions provided in the [Gaudi Installation
1013
Guide](https://docs.habana.ai/en/latest/Installation_Guide/index.html)
1114
to set up the execution environment. To achieve the best performance,
1215
please follow the methods outlined in the [Optimizing Training Platform
1316
Guide](https://docs.habana.ai/en/latest/PyTorch/Model_Optimization_PyTorch/Optimization_in_Training_Platform.html).
1417

15-
### Requirements
16-
17-
- OS: Ubuntu 22.04 LTS
18-
- Python: 3.10
19-
- Intel Gaudi accelerator
20-
- Intel Gaudi software version 1.18.0
21-
22-
### Quick start using Dockerfile
23-
24-
```console
25-
docker build -f Dockerfile.hpu -t vllm-hpu-env .
26-
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --rm vllm-hpu-env
27-
```
28-
29-
```{tip}
30-
If you're observing the following error: `docker: Error response from daemon: Unknown runtime specified habana.`, please refer to "Install Using Containers" section of [Intel Gaudi Software Stack and Driver Installation](https://docs.habana.ai/en/v1.18.0/Installation_Guide/Bare_Metal_Fresh_OS.html). Make sure you have `habana-container-runtime` package installed and that `habana` container runtime is registered.
31-
```
18+
## Configure a new environment
3219

33-
### Build from source
34-
35-
#### Environment verification
20+
### Environment verification
3621

3722
To verify that the Intel Gaudi software was correctly installed, run:
3823

@@ -47,7 +32,7 @@ Refer to [Intel Gaudi Software Stack
4732
Verification](https://docs.habana.ai/en/latest/Installation_Guide/SW_Verification.html#platform-upgrade)
4833
for more details.
4934

50-
#### Run Docker Image
35+
### Run Docker Image
5136

5237
It is highly recommended to use the latest Docker image from Intel Gaudi
5338
vault. Refer to the [Intel Gaudi
@@ -61,7 +46,13 @@ docker pull vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-i
6146
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest
6247
```
6348

64-
#### Build and Install vLLM
49+
## Set up using Python
50+
51+
### Pre-built wheels
52+
53+
Currently, there are no pre-built Intel Gaudi wheels.
54+
55+
### Build wheel from source
6556

6657
To build and install vLLM from source, run:
6758

@@ -80,7 +71,26 @@ git checkout habana_main
8071
python setup.py develop
8172
```
8273

83-
## Supported Features
74+
## Set up using Docker
75+
76+
### Pre-built images
77+
78+
Currently, there are no pre-built Intel Gaudi images.
79+
80+
### Build image from source
81+
82+
```console
83+
docker build -f Dockerfile.hpu -t vllm-hpu-env .
84+
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --rm vllm-hpu-env
85+
```
86+
87+
```{tip}
88+
If you're observing the following error: `docker: Error response from daemon: Unknown runtime specified habana.`, please refer to "Install Using Containers" section of [Intel Gaudi Software Stack and Driver Installation](https://docs.habana.ai/en/v1.18.0/Installation_Guide/Bare_Metal_Fresh_OS.html). Make sure you have `habana-container-runtime` package installed and that `habana` container runtime is registered.
89+
```
90+
91+
## Extra information
92+
93+
## Supported features
8494

8595
- [Offline inference](#offline-inference)
8696
- Online serving via [OpenAI-Compatible Server](#openai-compatible-server)
@@ -94,14 +104,14 @@ python setup.py develop
94104
for accelerating low-batch latency and throughput
95105
- Attention with Linear Biases (ALiBi)
96106

97-
## Unsupported Features
107+
## Unsupported features
98108

99109
- Beam search
100110
- LoRA adapters
101111
- Quantization
102112
- Prefill chunking (mixed-batch inferencing)
103113

104-
## Supported Configurations
114+
## Supported configurations
105115

106116
The following configurations have been validated to be function with
107117
Gaudi2 devices. Configurations that are not listed may or may not work.
@@ -137,7 +147,7 @@ Gaudi2 devices. Configurations that are not listed may or may not work.
137147
- [meta-llama/Meta-Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct)
138148
with tensor parallelism on 8x HPU, BF16 datatype with random or greedy sampling
139149

140-
## Performance Tuning
150+
## Performance tuning
141151

142152
### Execution modes
143153

@@ -368,7 +378,7 @@ Additionally, there are HPU PyTorch Bridge environment variables impacting vLLM
368378
- `PT_HPU_LAZY_MODE`: if `0`, PyTorch Eager backend for Gaudi will be used, if `1` PyTorch Lazy backend for Gaudi will be used, `1` is default
369379
- `PT_HPU_ENABLE_LAZY_COLLECTIVES`: required to be `true` for tensor parallel inference with HPU Graphs
370380

371-
## Troubleshooting: Tweaking HPU Graphs
381+
## Troubleshooting: tweaking HPU graphs
372382

373383
If you experience device out-of-memory issues or want to attempt
374384
inference at higher batch sizes, try tweaking HPU Graphs by following

0 commit comments

Comments
 (0)