-
-
Notifications
You must be signed in to change notification settings - Fork 8.4k
[Hardware][TPU][V1] Multi-LoRA implementation for the V1 TPU backend #14238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
…` to be called with infinities Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Right now I see there's a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion is still to get this merged but wait until TPU has decent performances before enabling it.
There are a number of changes which don't impact the "TPU area" directly, in both logic and interface in LoRa specific code which have been reviewed and still provide value in setting up the landscape.
The follow up PRs can focus on changes to TpuModelRunner and pallas+punica.
vllm/worker/tpu_worker.py
Outdated
if vllm_config.lora_config is not None: | ||
raise NotImplementedError( | ||
"""The V0 TPU backend doesn't support LoRA serving, please try \ | ||
V1 by setting VLLM_USE_V1=1""") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is misleading if we decide not to enable it just yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep I'll undo that there for now.
Wrt to the CI errors I see here:
|
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Thanks @NickLucche I think that's fixed the TPU tests. I've merged from main but the GPU side errors are still there. What's really fun is that they're errors with the Triton kernels which shouldn't be affected by this at all |
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
@classmethod | ||
def get_infinity_values(cls, dtype: torch.dtype) -> Tuple[float, float]: | ||
""" | ||
Return the platform specific values for (-inf, inf) | ||
""" | ||
return float("-inf"), float("inf") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we ignore the dtype here?
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for your great contribution!
The TPU CI test failure seems irrelevant to this PR because it also happens in 621ca2c
…llm-project#14238) Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Chengji Yao <chengjiyao@google.com> Co-authored-by: Chengji Yao <chengjiyao@google.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
…llm-project#14238) Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Chengji Yao <chengjiyao@google.com> Co-authored-by: Chengji Yao <chengjiyao@google.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
…llm-project#14238) Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Chengji Yao <chengjiyao@google.com> Co-authored-by: Chengji Yao <chengjiyao@google.com>
…llm-project#14238) Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Chengji Yao <chengjiyao@google.com> Co-authored-by: Chengji Yao <chengjiyao@google.com> Signed-off-by: Yuqi Zhang <yuqizhang@google.com>
…llm-project#14238) Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Chengji Yao <chengjiyao@google.com> Co-authored-by: Chengji Yao <chengjiyao@google.com> Signed-off-by: minpeter <kali2005611@gmail.com>
This PR adds a Multi-LoRA implementation that works on the TPU backend, extending the work done in #11100, and supercedes #12623. It has a functional but unoptimised Pallas kernel implementation for the bgmv kernel.