Closed as not planned
Description
Hello, I have 4 GPUs. And when I set tensor_parallel_size
as 2, when running the service, it would takes CUDA:0 and CUDA:1.
My question is, if I want start two workers(i.e. two process that deploy two same models), how to specify my second process takes on CUDA:2 and CUDA:3?
Cuz now if I just start service without any config, it will OOM.