You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Cherry pick of the docker vllm: update readme from habana_main
Signed-off-by: Tomasz Thaddey <tthaddey@habana.ai>
Signed-off-by: Artur Fierka <artur.fierka@intel.com>
Co-authored-by: Tomasz Thaddey <76682475+tthaddey@users.noreply.github.com>
This guide explains how to quickly run vLLM with multi-model support on Gaudi using a prebuilt Docker image.
3
+
This guide explains how to quickly run vLLM on Gaudi using a prebuilt Docker image and Docker Compose, with options for custom parameters and benchmarking.
4
+
Supports a wide range of validated models including LLaMa, Mistral, and Qwen families, with flexible configuration via environment variables or YAML files.
4
5
5
6
## Supported Models
6
7
@@ -25,113 +26,155 @@ This guide explains how to quickly run vLLM with multi-model support on Gaudi us
25
26
26
27
## How to Use
27
28
28
-
1.**Use the prebuilt vLLM container**
29
+
### 1. Run the server using Docker Compose
29
30
30
-
You do **not** need to build the Docker image yourself.
31
-
Use the ready-to-use image from an image registry:
31
+
The recommended and easiest way to start the vLLM server is with Docker Compose. At a minimum, set the following environment variables:
32
+
33
+
-`MODEL` - Select a model from the table above.
34
+
-`HF_TOKEN` - Your Hugging Face token (generate one at <https://huggingface.co>).
35
+
-`DOCKER_IMAGE` - The vLLM Docker image URL from Gaudi or local repository.
36
+
37
+
**Example usage:**
38
+
39
+
```bash
40
+
cd vllm-fork/.cd/
41
+
MODEL="Qwen/Qwen2.5-14B-Instruct" \
42
+
HF_TOKEN="<your huggingface token>" \
43
+
DOCKER_IMAGE="<docker image url>" \
44
+
docker compose up
45
+
```
46
+
47
+
### 2. Running the Server with a Benchmark
48
+
49
+
To easily initiate benchmark dedicated for a specific model using default parameters, use the `--profile benchmark up` option with Docker Compose:
32
50
33
51
```bash
34
-
docker pull <path to a docker image>
52
+
cd vllm-fork/.cd/
53
+
MODEL="Qwen/Qwen2.5-14B-Instruct" \
54
+
HF_TOKEN="<your huggingface token>" \
55
+
DOCKER_IMAGE="<docker image url>" \
56
+
docker compose --profile benchmark up
35
57
```
36
58
37
-
2.**Set required environment variables**
59
+
This launches the vLLM server and runs the benchmark suite automatically.
38
60
39
-
-`export MODEL=` (choose from the table above)
40
-
-`export HF_TOKEN=` (your huggingface token, can be generated from https://huggingface.co)
61
+
### 3. Run the server using Docker Compose with custom parameters
41
62
42
-
Tips:
43
-
- Model files can be large. For best performance, use an external disk for the Huggingface cache and set `HF_HOME` accordingly.
This will launch the vLLM server and run the benchmark suite using your specified parameters.
112
+
113
+
### 5. Running the Server and Benchmark, both with Custom Parameters
77
114
78
-
You can override defaults with additional `-e` variables, for example:
115
+
You can launch the vLLM server and benchmark together, specifying any combination of optional parameters for both the server and the benchmark. Set the desired environment variables before running Docker Compose.
This command will start the vLLM server and run the benchmark suite using your specified custom parameters.
97
134
98
-
Each instance should have unique values for `HABANA_VISIBLE_DEVICES`, host port, and container name.
99
-
See [docs.habana.ai - Multiple Tenants](https://docs.habana.ai/en/latest/Orchestration/Multiple_Tenants_on_HPU/Multiple_Dockers_each_with_Single_Workload.html) for details.
135
+
### 6. Running the Server and Benchmark Using Configuration Files
100
136
101
-
Example for two instances:
137
+
You can also configure the server and benchmark by specifying parameters in configuration files. To do this, set the following environment variables:
138
+
139
+
-`VLLM_SERVER_CONFIG_FILE` – Path to the server configuration file inside the Docker container.
140
+
-`VLLM_SERVER_CONFIG_NAME` – Name of the server configuration section.
141
+
-`VLLM_BENCHMARK_CONFIG_FILE` – Path to the benchmark configuration file inside the Docker container.
142
+
-`VLLM_BENCHMARK_CONFIG_NAME` – Name of the benchmark configuration section.
> When using configuration files, you do not need to set the `MODEL` environment variable, as the model name is specified within the configuration file. However, you must still provide your `HF_TOKEN`.
157
+
158
+
### 7. Running the Server Directly with Docker
159
+
160
+
For full control, you can run the server using the `docker run` command. This approach allows you to specify any native Docker parameters as needed.
0 commit comments