[core][misc] simply output processing with shortcut for non-parallel sampling and non-beam search usecase #7117

youkaichao · 2024-08-04T01:49:58Z

No description provided.

youkaichao · 2024-08-04T01:50:07Z

it is a tiny step of #7116

github-actions · 2024-08-04T01:50:09Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

Comment /ready on the PR
Add ready label to the PR
Enable auto-merge.

🚀

youkaichao · 2024-08-04T01:52:15Z

On H100:

python benchmarks/benchmark_throughput.py --input-len 256 --output-len 256 --model meta-llama/Meta-Llama-3-8B

Main branch (3 trials):

Throughput: 24.32 requests/s, 12453.04 tokens/s
Throughput: 24.11 requests/s, 12342.76 tokens/s
Throughput: 24.29 requests/s, 12434.40 tokens/s

This PR (3 trials):

Throughput: 24.52 requests/s, 12555.73 tokens/s
Throughput: 24.59 requests/s, 12591.35 tokens/s
Throughput: 24.74 requests/s, 12667.68 tokens/s

Already 1.5% improvement in throughput.

zhuohan123

LGTM!

…roject#7117)

…roject#7117) Signed-off-by: Alvant <alvasian@yandex.ru>

…roject#7117)

simply output processing

dc7ae67

remove redundant attr access

3281e92

zhuohan123 approved these changes Aug 4, 2024

View reviewed changes

youkaichao merged commit 83c644f into vllm-project:main Aug 4, 2024
31 checks passed

youkaichao deleted the simplify_output_process branch August 4, 2024 07:22

dtrifiro mentioned this pull request Aug 5, 2024

Sync with upstream@v0.5.4-7-g9118217f opendatahub-io/vllm#120

Closed

sfc-gh-mkeralapura pushed a commit to sfc-gh-mkeralapura/vllm that referenced this pull request Aug 12, 2024

[core][misc] simply output processing with shortcut code path (vllm-p…

89882bb

…roject#7117)

kylesayrs pushed a commit to neuralmagic/vllm that referenced this pull request Aug 17, 2024

[core][misc] simply output processing with shortcut code path (vllm-p…

80ebbea

…roject#7117)

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024

[core][misc] simply output processing with shortcut code path (vllm-p…

b695988

…roject#7117) Signed-off-by: Alvant <alvasian@yandex.ru>

KuntaiDu pushed a commit to KuntaiDu/vllm that referenced this pull request Nov 20, 2024

[core][misc] simply output processing with shortcut code path (vllm-p…

5b05887

…roject#7117)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core][misc] simply output processing with shortcut for non-parallel sampling and non-beam search usecase #7117

[core][misc] simply output processing with shortcut for non-parallel sampling and non-beam search usecase #7117

youkaichao commented Aug 4, 2024

youkaichao commented Aug 4, 2024

github-actions bot commented Aug 4, 2024

youkaichao commented Aug 4, 2024

zhuohan123 left a comment

[core][misc] simply output processing with shortcut for non-parallel sampling and non-beam search usecase #7117

[core][misc] simply output processing with shortcut for non-parallel sampling and non-beam search usecase #7117

Conversation

youkaichao commented Aug 4, 2024

youkaichao commented Aug 4, 2024

github-actions bot commented Aug 4, 2024

youkaichao commented Aug 4, 2024

zhuohan123 left a comment

Choose a reason for hiding this comment