Multi-GPU inference and Specify which GPUs to be used during inference

I have two questions:

1. I attempted multi-GPU inference (8 GPU inference on A100) on Llama-13B. I followed the steps described in [https://github.com/vllm-project/vllm/issues/188], first running `$ ray start --head` and then `llm = LLM(model=<your model>, tensor_parallel_size=8)`. 
However, I got the following error:
(Worker pid=1027546) AssertionError: 32001 is not divisible by 8 [repeated 7x across cluster]
Is there any way to resolve this issue?

3. Additionally, is there a way to specify which GPUs are used during inference? I tried using `os.environ["CUDA_VISIBLE_DEVICES"]="2"` but it doesn't seem to work - it continues to use the first GPU.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Multi-GPU inference and Specify which GPUs to be used during inference #250

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Multi-GPU inference and Specify which GPUs to be used during inference #250

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions