How to specify which GPU the model inference on?

Hello, I have 4 GPUs. And when I set `tensor_parallel_size` as 2, when running the service, it would takes CUDA:0 and CUDA:1.

My question is, if I want start two workers(i.e. two process that deploy two same models), how to specify my second process takes on CUDA:2 and CUDA:3?

Cuz now if I just start service without any config, it will OOM.