[TPU] Add example for profiling TPU inference #12531

mgoin · 2025-01-28T22:27:44Z

Provides an example for simple prefill or decode profiling on TPUs. This is a starting point equivalent to text-only inference using benchmark_latency.py, where all the user can specify is batch size, input length, and output length.

Future work should expand this example to cover realistic data and multimodal inference.

Example screenshot of a profile in tensorboard (tensorboard --logdir profiles/ --port 6006):

Signed-off-by: mgoin <mgoin@redhat.com>

github-actions · 2025-01-28T22:27:57Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: mgoin <mgoin@redhat.com>

Signed-off-by: mgoin <mgoin@redhat.com> Signed-off-by: Isotr0py <2037008807@qq.com>

Signed-off-by: mgoin <mgoin@redhat.com>

Signed-off-by: mgoin <mgoin@redhat.com> Signed-off-by: Linkun Chen <github@lkchen.net>

Signed-off-by: mgoin <mgoin@redhat.com> Signed-off-by: saeediy <saidakbarp@gmail.com>

Add example for profiling TPU inference

5ec59e2

Signed-off-by: mgoin <mgoin@redhat.com>

mgoin added 2 commits January 28, 2025 23:26

Format

9eea26d

Signed-off-by: mgoin <mgoin@redhat.com>

Format

ee6993c

Signed-off-by: mgoin <mgoin@redhat.com>

robertgshaw2-redhat approved these changes Jan 29, 2025

View reviewed changes

robertgshaw2-redhat enabled auto-merge (squash) January 29, 2025 02:24

robertgshaw2-redhat added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 29, 2025

robertgshaw2-redhat merged commit fbb5bd4 into vllm-project:main Jan 29, 2025
42 checks passed

rasmith pushed a commit to rasmith/vllm that referenced this pull request Jan 30, 2025

[TPU] Add example for profiling TPU inference (vllm-project#12531)

08bfe30

Signed-off-by: mgoin <mgoin@redhat.com>

Isotr0py pushed a commit to Isotr0py/vllm that referenced this pull request Feb 2, 2025

[TPU] Add example for profiling TPU inference (vllm-project#12531)

f427e52

Signed-off-by: mgoin <mgoin@redhat.com> Signed-off-by: Isotr0py <2037008807@qq.com>

NickLucche pushed a commit to NickLucche/vllm that referenced this pull request Feb 7, 2025

[TPU] Add example for profiling TPU inference (vllm-project#12531)

f29c2c5

Signed-off-by: mgoin <mgoin@redhat.com>

ShangmingCai pushed a commit to ShangmingCai/vllm that referenced this pull request Feb 10, 2025

[TPU] Add example for profiling TPU inference (vllm-project#12531)

e25d63e

Signed-off-by: mgoin <mgoin@redhat.com>

GWS0428 pushed a commit to GWS0428/VARserve that referenced this pull request Feb 12, 2025

[TPU] Add example for profiling TPU inference (vllm-project#12531)

91afbeb

Signed-off-by: mgoin <mgoin@redhat.com>

panf2333 pushed a commit to yottalabsai/vllm that referenced this pull request Feb 18, 2025

[TPU] Add example for profiling TPU inference (vllm-project#12531)

434e138

Signed-off-by: mgoin <mgoin@redhat.com>

kerthcet pushed a commit to kerthcet/vllm that referenced this pull request Feb 21, 2025

[TPU] Add example for profiling TPU inference (vllm-project#12531)

8be1b0b

Signed-off-by: mgoin <mgoin@redhat.com>

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Mar 5, 2025

[TPU] Add example for profiling TPU inference (vllm-project#12531)

7dec3cf

Signed-off-by: mgoin <mgoin@redhat.com> Signed-off-by: Linkun Chen <github@lkchen.net>

Said-Akbar pushed a commit to Said-Akbar/vllm-rocm that referenced this pull request Mar 7, 2025

[TPU] Add example for profiling TPU inference (vllm-project#12531)

584cf19

Signed-off-by: mgoin <mgoin@redhat.com> Signed-off-by: saeediy <saidakbarp@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TPU] Add example for profiling TPU inference #12531

[TPU] Add example for profiling TPU inference #12531

mgoin commented Jan 28, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Jan 28, 2025

[TPU] Add example for profiling TPU inference #12531

[TPU] Add example for profiling TPU inference #12531

Conversation

mgoin commented Jan 28, 2025 • edited by github-actions bot Loading

github-actions bot commented Jan 28, 2025

mgoin commented Jan 28, 2025 •

edited by github-actions bot

Loading