Open
Description
I am exploring whether vLLM can be deployed on the same GPU that is also used for model training. Specifically, assuming a small model (e.g., 3B) that only occupies a portion of GPU memory during inference, would it be possible to run vLLM inference while using the same GPU for training?
E.g., for a single-node setup with 8 GPUs:
- vLLM server runs on GPU0.
- Model training utilizes all 8 GPUs, including GPU0.
Would this setup be feasible? Thank you.
Metadata
Metadata
Assignees
Labels
No labels