[Bug]: Running the punica lora on Qwen1.5 32B model encountered RuntimeError: No suitable kernel. h_in=64 h_out=3424 dtype=Float out_dtype=BFloat16

### Your current environment

3090 8 gpus 
vllm 0.4.1



### 🐛 Describe the bug

Hello vllm team,

When I try to run qwen1.5 32B model with punica lora request with tensor parallel size = 8, I have encountered the following error: 
punica_kernels.dispatch_bgmv_low_level(
(RayWorkerWrapper pid=1566630) ERROR 04-26 16:28:41 worker_base.py:157] RuntimeError: No suitable kernel. h_in=64 h_out=3424 dtype=Float out_dtype=BFloat16

I am using 3090 machine with 8 gpus and I wish to use all my gpus. Can you have any chance to take look at this issue I am facing? Thank you very much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: Running the punica lora on Qwen1.5 32B model encountered RuntimeError: No suitable kernel. h_in=64 h_out=3424 dtype=Float out_dtype=BFloat16 #4708

Your current environment

🐛 Describe the bug

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Running the punica lora on Qwen1.5 32B model encountered RuntimeError: No suitable kernel. h_in=64 h_out=3424 dtype=Float out_dtype=BFloat16 #4708

Description

Your current environment

🐛 Describe the bug

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions