[Usage]: How to start vLLM on a particular GPU? #4981

kstyagi23 · 2024-05-22T12:41:56Z

Your current environment

Collecting environment information...
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 
Clang version: Could not collect
CMake version: version 3.29.3
Libc version: glibc-2.31

Python version: 3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-1056-azure-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.8.89
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA A100 80GB PCIe
GPU 1: NVIDIA A100 80GB PCIe

Nvidia driver version: 545.23.08
cuDNN version: Probably one of the following:
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn.so.8.7.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.7.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.7.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.7.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.7.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.7.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.7.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit 
Byte Order:                         Little Endian  
Address sizes:                      48 bits physical, 48 bits virtual
CPU(s):                             48
On-line CPU(s) list:                0-47
Thread(s) per core:                 1
Core(s) per socket:                 48
Socket(s):                          1
NUMA node(s):                       2
Vendor ID:                          AuthenticAMD   
CPU family:                         25
Model:                              1
Model name:                         AMD EPYC 7V13 64-Core Processor
Stepping:                           1
CPU MHz:                            2445.437
BogoMIPS:                           4890.87
Hypervisor vendor:                  Microsoft
Virtualization type:                full
L1d cache:                          1.5 MiB
L1i cache:                          1.5 MiB
L2 cache:                           24 MiB
L3 cache:                           192 MiB
NUMA node0 CPU(s):                  0-23
NUMA node1 CPU(s):                  24-47
Vulnerability Gather data sampling: Not affected   
Vulnerability Itlb multihit:        Not affected   
Vulnerability L1tf:                 Not affected   
Vulnerability Mds:                  Not affected   
Vulnerability Meltdown:             Not affected   
Vulnerability Mmio stale data:      Not affected   
Vulnerability Retbleed:             Not affected   
Vulnerability Spec rstack overflow: Mitigation; safe RET, no microcode
Vulnerability Spec store bypass:    Vulnerable
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:           Mitigation; Retpolines, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds:                Not affected   
Vulnerability Tsx async abort:      Not affected   
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core invpcid_single vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero xsaveerptr rdpru arat umip vaes vpclmulqdq rdpid fsrm

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] nvidia-nccl-cu12==2.20.5
[pip3] torch==2.3.0
[pip3] triton==2.3.0
[pip3] vllm_nccl_cu12==2.18.1.0.4.0
[conda] numpy                     1.26.4                   pypi_0    pypi
[conda] nvidia-nccl-cu12          2.20.5                   pypi_0    pypi
[conda] torch                     2.3.0                    pypi_0    pypi
[conda] triton                    2.3.0                    pypi_0    pypi
[conda] vllm-nccl-cu12            2.18.1.0.4.0             pypi_0    pypiROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.4.2
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0    GPU1    NIC0    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      NV12    SYS     0-23    0               N/A
GPU1    NV12     X      SYS     24-47   1               N/A
NIC0    SYS     SYS      X

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_0

How would you like to use vllm

I have two GPUs in my VM... I am already using vLLM on one of the GPUs and the other one is vacant.
How can I start a second vLLM instance on the second GPU of mine?

I tried:

--device cuda    |    --device auto    |    --device cuda:1

but they don't seem to work as I was expecting...

Could you please tell me what am I missing here?

Regards!

The text was updated successfully, but these errors were encountered:

DarkLight1337 · 2024-05-22T13:33:36Z

You can use CUDA_VISIBLE_DEVICES environment variable when running the command.

fengshansi · 2024-05-22T16:01:34Z

I changed CUDA_VISIBLE_DEVICES, and when I delete CUDA_VISIBLE_DEVICES to load another model. I got an error: CUDA error: invalid device ordinal.

DarkLight1337 · 2024-05-23T00:37:07Z

I changed CUDA_VISIBLE_DEVICES, and when I delete CUDA_VISIBLE_DEVICES to load another model. I got an error: CUDA error: invalid device ordinal.

Can you show the commands (including env variables) which you used to run vLLM?

fengshansi · 2024-05-23T06:36:37Z

我更改了CUDA_VISIBLE_DEVICES，当我删除CUDA_VISIBLE_DEVICES以加载另一个模型时。我收到错误：CUDA 错误：设备序号无效。

您能展示用于运行 vLLM 的命令（包括 env 变量）吗？

I use an script to select GPU of most memory. So I have to del CUDA_VISIBLE_DEVICES env variable after I load a model, and then to load another model. However, When I move new model to the device I select. I got the error.
Actually, I think this bug is not caused by vllm. Even I don't use vllm, when I set CUDA_VISIBLE_DEVICES and then unset CUDA_VISIBLE_DEVICES to load another model, I will got an error. I don't think set CUDA_VISIBLE_DEVICES is a good way to set GPU.

fengshansi · 2024-05-23T06:55:20Z

我更改了CUDA_VISIBLE_DEVICES，当我删除CUDA_VISIBLE_DEVICES以加载另一个模型时。我收到错误：CUDA 错误：设备序号无效。

您能展示用于运行 vLLM 的命令（包括 env 变量）吗？

It appears that if you set the CUDA_VISIBLE_DEVICES environment variable, for example, os.environ["CUDA_VISIBLE_DEVICES"] = "2,3", then in your code, the device indices will start from 0. That is, cuda:0 corresponds to the actual cuda:2, and cuda:1 corresponds to the actual cuda:3

DarkLight1337 · 2024-05-23T06:58:03Z

我更改了CUDA_VISIBLE_DEVICES，当我删除CUDA_VISIBLE_DEVICES以加载另一个模型时。我收到错误：CUDA 错误：设备序号无效。

您能展示用于运行 vLLM 的命令（包括 env 变量）吗？

It appears that if you set the CUDA_VISIBLE_DEVICES environment variable, for example, os.environ["CUDA_VISIBLE_DEVICES"] = "2,3", then in your code, the device indices will start from 0. That is, cuda:0 corresponds to the actual cuda:2, and cuda:1 corresponds to the actual cuda:3

Usually, I set the environment variable in the command line instead of inside Python, e.g.:

CUDA_VISIBLE_DEVICES=0,1 python -m <command>

This is because the environment variable needs to be updated before importing PyTorch in order for it to properly take effect, which is difficult to rely on.

fengshansi · 2024-05-23T07:37:01Z

我更改了CUDA_VISIBLE_DEVICES，当我删除CUDA_VISIBLE_DEVICES以加载另一个模型时。我收到错误：CUDA 错误：设备序号无效。

您能展示用于运行 vLLM 的命令（包括 env 变量）吗？

如果您设置了CUDA_VISIBLE_DEVICES环境变量，例如 os.environ[“CUDA_VISIBLE_DEVICES”] = “2,3”，那么在您的代码中，设备索引将从 0 开始。也就是说，cuda：0 对应于实际的 cuda：2，而 cuda：1 对应于实际的 cuda：3

通常，我在命令行中而不是在 Python 中设置环境变量，例如：
CUDA_VISIBLE_DEVICES=0,1 python -m <command>
这是因为在导入 PyTorch 之前需要更新环境变量才能使其正确生效，这很难依赖。

I have several model and gpu. So I have to set CUDA_VISIBLE_DEVICES several times, and get error. Set CUDA_VISIBLE_DEVICES is not a good way. I think when people have several model and gpu, they need a device paramter.

DarkLight1337 · 2024-05-23T08:01:07Z

You can run multiple vLLM commands simultaneously, each with a different GPU.

fengshansi · 2024-05-23T09:20:16Z

I have decided not to use vllm. Vllm has a DeviceConfig configuration, and you can pass a device
paramter to vllm.LLM. but the kv-cache does not use it and always uses cuda:0. This is too messy.

kstyagi23 added the usage How to use vllm label May 22, 2024

fengshansi mentioned this issue May 24, 2024

[Usage]: Is it possible to start 8 tp=1 LLMEngine on a 8-GPU machine? #4969

Closed

DarkLight1337 mentioned this issue Jun 13, 2024

Add cuda_device_count_stateless #5473

Merged

simon-mo closed this as completed in #5473 Jun 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Usage]: How to start vLLM on a particular GPU? #4981

[Usage]: How to start vLLM on a particular GPU? #4981

kstyagi23 commented May 22, 2024 •

edited

Loading

DarkLight1337 commented May 22, 2024

fengshansi commented May 22, 2024

DarkLight1337 commented May 23, 2024 •

edited

Loading

fengshansi commented May 23, 2024

fengshansi commented May 23, 2024

DarkLight1337 commented May 23, 2024 •

edited

Loading

fengshansi commented May 23, 2024

DarkLight1337 commented May 23, 2024

fengshansi commented May 23, 2024 •

edited

Loading

[Usage]: How to start vLLM on a particular GPU? #4981

[Usage]: How to start vLLM on a particular GPU? #4981

Comments

kstyagi23 commented May 22, 2024 • edited Loading

Your current environment

How would you like to use vllm

DarkLight1337 commented May 22, 2024

fengshansi commented May 22, 2024

DarkLight1337 commented May 23, 2024 • edited Loading

fengshansi commented May 23, 2024

fengshansi commented May 23, 2024

DarkLight1337 commented May 23, 2024 • edited Loading

fengshansi commented May 23, 2024

DarkLight1337 commented May 23, 2024

fengshansi commented May 23, 2024 • edited Loading

kstyagi23 commented May 22, 2024 •

edited

Loading

DarkLight1337 commented May 23, 2024 •

edited

Loading

DarkLight1337 commented May 23, 2024 •

edited

Loading

fengshansi commented May 23, 2024 •

edited

Loading