Skip to content

Multi-GPU inference and Specify which GPUs to be used during inference #250

Closed
@EthanC111

Description

@EthanC111

I have two questions:

  1. I attempted multi-GPU inference (8 GPU inference on A100) on Llama-13B. I followed the steps described in [https://github.com/CUDA error: out of memory #188], first running $ ray start --head and then llm = LLM(model=<your model>, tensor_parallel_size=8).
    However, I got the following error:
    (Worker pid=1027546) AssertionError: 32001 is not divisible by 8 [repeated 7x across cluster]
    Is there any way to resolve this issue?

  2. Additionally, is there a way to specify which GPUs are used during inference? I tried using os.environ["CUDA_VISIBLE_DEVICES"]="2" but it doesn't seem to work - it continues to use the first GPU.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions