Skip to content

How to specify which GPU the model inference on? #352

Closed as not planned
Closed as not planned
@zoubaihan

Description

@zoubaihan

Hello, I have 4 GPUs. And when I set tensor_parallel_size as 2, when running the service, it would takes CUDA:0 and CUDA:1.

My question is, if I want start two workers(i.e. two process that deploy two same models), how to specify my second process takes on CUDA:2 and CUDA:3?

Cuz now if I just start service without any config, it will OOM.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions