Skip to content

[Bug]: Running the punica lora on Qwen1.5 32B model encountered RuntimeError: No suitable kernel. h_in=64 h_out=3424 dtype=Float out_dtype=BFloat16 #4708

@victorzhz111

Description

@victorzhz111

Your current environment

3090 8 gpus
vllm 0.4.1

🐛 Describe the bug

Hello vllm team,

When I try to run qwen1.5 32B model with punica lora request with tensor parallel size = 8, I have encountered the following error:
punica_kernels.dispatch_bgmv_low_level(
(RayWorkerWrapper pid=1566630) ERROR 04-26 16:28:41 worker_base.py:157] RuntimeError: No suitable kernel. h_in=64 h_out=3424 dtype=Float out_dtype=BFloat16

I am using 3090 machine with 8 gpus and I wish to use all my gpus. Can you have any chance to take look at this issue I am facing? Thank you very much!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions