vllm-project / tpu-inference Public

Notifications You must be signed in to change notification settings
Fork 14
Star 104

Code
Issues 5
Pull requests 33
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Pull requests: vllm-project/tpu-inference

Labels 10 Milestones 0

New pull request New

33 Open 853 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[Chore] Update import path for vllm.utils

#901 opened Oct 20, 2025 by wdhongtw

Loading…

[WIP] Add Qwen3-Omni model

#896 opened Oct 19, 2025 by eitanporat

Loading…

add jax support for Qwen2VL

#893 opened Oct 18, 2025 by shungcp

Loading…

[Feature] [TPU host offload] sglang bench_serving tool example

#892 opened Oct 18, 2025 by saikat-royc

Loading…

Added the docker login instructions

#891 opened Oct 17, 2025 by hosseinsarshar

Loading…

[Doc] Docker guide extended

#890 opened Oct 17, 2025 by hosseinsarshar

Loading…

Data Parallelism support

#865 opened Oct 14, 2025 by wenxindongwork • Draft

[GPT-OSS] JAX implementation of GPT-OSS

#861 opened Oct 14, 2025 by bzgoogle

Loading…

[CI] remove lora_bias_stacked as it is deprecated in vllm

#835 opened Oct 11, 2025 by bzgoogle

Loading…

Enable spmd on lora

#829 opened Oct 10, 2025 by vanbasten23

Loading…

lora spmd

#802 opened Oct 8, 2025 by vanbasten23 • Draft

feat: Add a procedures to record the vllm and tpu_inference's commit hashes in CI pipeline (WIP)

#795 opened Oct 7, 2025 by dennisYehCienet

Loading…

update max_model_len to plus 1 to adjust vllm change

#793 opened Oct 6, 2025 by Chenyaaang • Draft

Prototyping load weight scale for qwen3.

#741 opened Sep 25, 2025 by inho9606

Loading…

[Test only] Remove the model cache

#725 opened Sep 22, 2025 by QiliangCui

Loading…

[kernel][RPA v3] Use dummy DMA on wait to save SREGs usage

#718 opened Sep 19, 2025 by lsy323 • Draft

extract docker build step (wip)

#713 opened Sep 19, 2025 by CienetStingLin

Loading…

prefill decode microbenchmark for QWen3

#699 opened Sep 16, 2025 by mailvijayasingh

Loading…

[do nor review/merge] qwen 3 working microbenchmark

#632 opened Sep 3, 2025 by mailvijayasingh

Loading…

[Draft] Llama4-Guard Support

#626 opened Sep 3, 2025 by JiriesKaileh • Draft

[Misc] Upgrade to flax 0.11.2

#603 opened Aug 28, 2025 by py4

Loading…

[Draft-WIP- Do Not Merge] RPA Kernel v3 MB for llama

#571 opened Aug 26, 2025 by mailvijayasingh

Loading…

Create TorchaxMergedColumnParallelLinearWithLoRA lora wrapper for single chip

#496 opened Aug 18, 2025 by vanbasten23

Loading…

Llama4scout logit checking

#475 opened Aug 13, 2025 by gpolovets1

Loading…

Parallelize llama4 implementation weight loading

#457 opened Aug 12, 2025 by KWang1998

Loading…

Previous 1 2 Next

Previous Next

ProTip! no:milestone will show everything without a milestone.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!