-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Usage]: How to start vLLM on a particular GPU? #4981
Comments
You can use |
I changed CUDA_VISIBLE_DEVICES, and when I delete CUDA_VISIBLE_DEVICES to load another model. I got an error: CUDA error: invalid device ordinal. |
Can you show the commands (including env variables) which you used to run vLLM? |
I use an script to select GPU of most memory. So I have to del CUDA_VISIBLE_DEVICES env variable after I load a model, and then to load another model. However, When I move new model to the device I select. I got the error. |
It appears that if you set the CUDA_VISIBLE_DEVICES environment variable, for example, os.environ["CUDA_VISIBLE_DEVICES"] = "2,3", then in your code, the device indices will start from 0. That is, cuda:0 corresponds to the actual cuda:2, and cuda:1 corresponds to the actual cuda:3 |
Usually, I set the environment variable in the command line instead of inside Python, e.g.:
This is because the environment variable needs to be updated before importing PyTorch in order for it to properly take effect, which is difficult to rely on. |
I have several model and gpu. So I have to set CUDA_VISIBLE_DEVICES several times, and get error. Set CUDA_VISIBLE_DEVICES is not a good way. I think when people have several model and gpu, they need a device paramter. |
You can run multiple vLLM commands simultaneously, each with a different GPU. |
I have decided not to use vllm. Vllm has a DeviceConfig configuration, and you can pass a device |
Your current environment
How would you like to use vllm
I have two GPUs in my VM... I am already using vLLM on one of the GPUs and the other one is vacant.
How can I start a second vLLM instance on the second GPU of mine?
I tried:
but they don't seem to work as I was expecting...
Could you please tell me what am I missing here?
Regards!
The text was updated successfully, but these errors were encountered: