Your current environment
3090 8 gpus
vllm 0.4.1
🐛 Describe the bug
Hello vllm team,
When I try to run qwen1.5 32B model with punica lora request with tensor parallel size = 8, I have encountered the following error:
punica_kernels.dispatch_bgmv_low_level(
(RayWorkerWrapper pid=1566630) ERROR 04-26 16:28:41 worker_base.py:157] RuntimeError: No suitable kernel. h_in=64 h_out=3424 dtype=Float out_dtype=BFloat16
I am using 3090 machine with 8 gpus and I wish to use all my gpus. Can you have any chance to take look at this issue I am facing? Thank you very much!