Closed
Description
At line of the code, when using vllm, a unique GPU device is specified here. However, in fact, it is quite common to use a single vllm instance with multiple GPUs.
- What is the reason that the code is designed to only select a single GPU?
- Where does the 'device' parameter of this LLM interface eventually get passed to? When I entered this function, I couldn't find the corresponding parameter processing method (this might be a very basic question).
- When I changed the 'device' parameter to tensor_parallel_size (and also set the world_size and other parameters), an error occurred.
I've noticed that some other PRs have made modifications to the multi-GPU usage of vllm, but not at the interface where LLM is used. I'm curious about the reasons behind this.
If anyone is willing to answer me, I would be very grateful.