Skip to content

[Hardware][TPU][V1] Multi-LoRA implementation for the V1 TPU backend #14238

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 151 commits into from
May 7, 2025

Conversation

Akshat-Tripathi
Copy link
Contributor

This PR adds a Multi-LoRA implementation that works on the TPU backend, extending the work done in #11100, and supercedes #12623. It has a functional but unoptimised Pallas kernel implementation for the bgmv kernel.

Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
…` to be called with infinities

Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
@NickLucche
Copy link
Contributor

Hi I'm having trouble with the CI here

Right now I see there's a No space left on device error which we tried to fix just recently, might be worth to give it another spin. I will review this PR asap, thanks again for the great patience and work here!

Copy link
Contributor

@NickLucche NickLucche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion is still to get this merged but wait until TPU has decent performances before enabling it.

There are a number of changes which don't impact the "TPU area" directly, in both logic and interface in LoRa specific code which have been reviewed and still provide value in setting up the landscape.

The follow up PRs can focus on changes to TpuModelRunner and pallas+punica.

Comment on lines 57 to 61
if vllm_config.lora_config is not None:
raise NotImplementedError(
"""The V0 TPU backend doesn't support LoRA serving, please try \
V1 by setting VLLM_USE_V1=1""")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is misleading if we decide not to enable it just yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep I'll undo that there for now.

@NickLucche
Copy link
Contributor

Wrt to the CI errors I see here:

  • Lora tests failures should be unrelated, but to be safe we'll have to merge from main again and confirm everything is working.
  • "No space left on device" is happening during the dockerfile build, hence the fix we landed doesn't cover this case @mgoin . We either need someone with access to the instance or we try to debug in a separate PR. I would assume images from previous runs are cleared but I could be mistaken. It's weird other PRs are not reporting this.

Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
@Akshat-Tripathi
Copy link
Contributor Author

Thanks @NickLucche I think that's fixed the TPU tests. I've merged from main but the GPU side errors are still there. What's really fun is that they're errors with the Triton kernels which shouldn't be affected by this at all

Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Copy link

mergify bot commented May 6, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Akshat-Tripathi.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label May 6, 2025
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
@mergify mergify bot removed the needs-rebase label May 6, 2025
Comment on lines +335 to +340
@classmethod
def get_infinity_values(cls, dtype: torch.dtype) -> Tuple[float, float]:
"""
Return the platform specific values for (-inf, inf)
"""
return float("-inf"), float("inf")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we ignore the dtype here?

Copy link

mergify bot commented May 7, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Akshat-Tripathi.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label May 7, 2025
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
@mergify mergify bot removed the needs-rebase label May 7, 2025
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Copy link
Collaborator

@yaochengji yaochengji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for your great contribution!

The TPU CI test failure seems irrelevant to this PR because it also happens in 621ca2c

@mgoin mgoin merged commit c20ef40 into vllm-project:main May 7, 2025
54 checks passed
princepride pushed a commit to princepride/vllm that referenced this pull request May 10, 2025
…llm-project#14238)

Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Chengji Yao <chengjiyao@google.com>
Co-authored-by: Chengji Yao <chengjiyao@google.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
…llm-project#14238)

Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Chengji Yao <chengjiyao@google.com>
Co-authored-by: Chengji Yao <chengjiyao@google.com>
Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
mawong-amd pushed a commit to ROCm/vllm that referenced this pull request May 14, 2025
…llm-project#14238)

Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Chengji Yao <chengjiyao@google.com>
Co-authored-by: Chengji Yao <chengjiyao@google.com>
zzzyq pushed a commit to zzzyq/vllm that referenced this pull request May 24, 2025
…llm-project#14238)

Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Chengji Yao <chengjiyao@google.com>
Co-authored-by: Chengji Yao <chengjiyao@google.com>
Signed-off-by: Yuqi Zhang <yuqizhang@google.com>
minpeter pushed a commit to minpeter/vllm that referenced this pull request Jun 24, 2025
…llm-project#14238)

Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Chengji Yao <chengjiyao@google.com>
Co-authored-by: Chengji Yao <chengjiyao@google.com>
Signed-off-by: minpeter <kali2005611@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build ready ONLY add when PR is ready to merge/full CI is needed tpu Related to Google TPUs v1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants