Skip to content

[Misc] Add Next Edit Prediction (NEP) datasets support in benchmark_serving.py #16839

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

dtransposed
Copy link
Contributor

@dtransposed dtransposed commented Apr 18, 2025

I am currently using benchmark scripts to profile one of the Next Edit Prediction (NEP) models, zeta on their dataset.

This dataset is functionally similar to the existing likaixin/InstructCoder dataset, in the sense that it can be useful for benchmarking speculative decoding models. However, I decided to create a separate HuggingFaceDataset object, since NEP models have their specific structure (history of edits, prefixes, sometimes suffixes, etc). The code should be extensible to other, similar use cases.

  • Fix linting and DCO

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@dtransposed dtransposed marked this pull request as ready for review April 18, 2025 11:05
@dtransposed dtransposed force-pushed the feature/damian/add_nep_benchmarks branch 2 times, most recently from 586f9ba to e9e1d5b Compare April 18, 2025 12:24
Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We started moving these benchmark scripts internal to the package so we can have CLI commands. Could you either add it here instead or in both places please? https://github.com/vllm-project/vllm/blob/686623c5e7a0ee0c7679c052ced565dd83055709/vllm/benchmarks/datasets.py

I think the current complication is the vllm bench serve command doesn't have datasets hooked up yet, but I think it should be trivial once I added these #16508

Copy link
Contributor

@NickLucche NickLucche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR!
Is Zeta the only NEP dataset available and supported as of now?

@dtransposed
Copy link
Contributor Author

We started moving these benchmark scripts internal to the package so we can have CLI commands. Could you either add it here instead or in both places please? https://github.com/vllm-project/vllm/blob/686623c5e7a0ee0c7679c052ced565dd83055709/vllm/benchmarks/datasets.py

I think the current complication is the vllm bench serve command doesn't have datasets hooked up yet, but I think it should be trivial once I added these #16508

This is great! I was afraid I would need to keep our separate fork of benchmarks and maintain it!

Copy link

mergify bot commented Apr 22, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @dtransposed.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@dtransposed dtransposed force-pushed the feature/damian/add_nep_benchmarks branch from 2d89515 to 7bd4f02 Compare April 22, 2025 09:33
@mergify mergify bot added documentation Improvements or additions to documentation ci/build frontend multi-modality Related to multi-modality (#4194) speculative-decoding v1 tpu Related to Google TPUs labels Apr 22, 2025
Copy link
Contributor

@NickLucche NickLucche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@dtransposed
Copy link
Contributor Author

dtransposed commented Apr 24, 2025

@NickLucche why cannot I land this? Is this because of the buildkite/ci/pr job?

@dtransposed
Copy link
Contributor Author

@mgoin @NickLucche just a kind reminder, would love to land it.

Copy link

mergify bot commented Apr 30, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @dtransposed.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Apr 30, 2025
Signed-off-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal>

Signed-off-by: dtransposed <>
@dtransposed
Copy link
Contributor Author

@mgoin @NickLucche kind reminder.

@mergify mergify bot removed the needs-rebase label May 5, 2025
@dtransposed dtransposed requested a review from NickLucche May 5, 2025 06:39
@NickLucche
Copy link
Contributor

NickLucche commented May 6, 2025

Sorry for the late reply!
Can you try to sync with main once again? #17677 just landed so we might be able to merge this without forcing (due to failing tests).

Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks

@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label May 6, 2025
@dtransposed
Copy link
Contributor Author

@NickLucche on it

@NickLucche
Copy link
Contributor

Looks like pre-commit is failing, please give it another run locally when you find the time!

@dtransposed
Copy link
Contributor Author

dtransposed commented May 6, 2025

Looks like pre-commit is failing, please give it another run locally when you find the time!

(.venv) (base) damian@damian-ml-machine:~/vllm$ pre-commit run --all-files
yapf.....................................................................Passed
ruff.....................................................................Passed
codespell................................................................Passed
isort....................................................................Passed
clang-format.............................................................Passed
PyMarkdown...............................................................Passed
Lint GitHub Actions workflow files.......................................Passed
pip-compile..............................................................Passed
Run mypy for local Python installation...................................Passed
Lint shell scripts.......................................................Passed
Lint PNG exports from excalidraw.........................................Passed
Check SPDX headers.......................................................Passed
Check for spaces in all filenames........................................Passed
Update Dockerfile dependency graph.......................................Passed
Suggestion...............................................................Passed
- hook id: suggestion
- duration: 0s

:(
Maybe I can just rerun failed jobs (if its possible at some point)

@dtransposed
Copy link
Contributor Author

@mgoin the failing test seems unrelated ("works locally on my machine" heh), would you mind force landing it?

@dtransposed dtransposed changed the base branch from main to qwen25vl May 6, 2025 19:13
@dtransposed dtransposed changed the base branch from qwen25vl to main May 6, 2025 19:14
@mgoin mgoin merged commit d456aea into vllm-project:main May 6, 2025
52 of 53 checks passed
@mgoin
Copy link
Member

mgoin commented May 6, 2025

Just had to retry the precommit, seems it failed to download the base docker image

robertgshaw2-redhat added a commit to neuralmagic/vllm that referenced this pull request May 6, 2025
* [Model] Add GraniteMoeHybrid 4.0 model (vllm-project#17497)

Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Co-authored-by: Thomas Ortner <boh@zurich.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>

* [easy] Fix logspam on PiecewiseBackend errors (vllm-project#17138)

Signed-off-by: rzou <zou3519@gmail.com>

* [Bugfix] Fixed prompt length for random dataset (vllm-project#17408)

Signed-off-by: Mikhail Podvitskii <podvitskiymichael@gmail.com>

* [Doc] Update notes for H2O-VL and Gemma3 (vllm-project#17219)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* [Misc] Fix ScalarType float4 naming  (vllm-project#17690)

Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>

* Fix `dockerfilegraph` pre-commit hook (vllm-project#17698)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [Bugfix] Fix triton import with local TritonPlaceholder (vllm-project#17446)

Signed-off-by: Mengqing Cao <cmq0113@163.com>

* [V1] Enable TPU V1 backend by default (vllm-project#17673)

Signed-off-by: mgoin <mgoin64@gmail.com>

* [V1][PP] Support PP for MultiprocExecutor (vllm-project#14219)

Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: jiang.li <jiang1.li@intel.com>

* [v1] AttentionMetadata for each layer (vllm-project#17394)

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

* [Feat] Add deprecated=True to CLI args (vllm-project#17426)

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>

* [Docs] Use gh-file to add links to tool_calling.md (vllm-project#17709)

Signed-off-by: windsonsea <haifeng.yao@daocloud.io>

* [v1] Introduce KVCacheBlocks as interface between Scheduler and KVCacheManager (vllm-project#17479)

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

* [doc] Add RAG Integration example (vllm-project#17692)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* [Bugfix] Fix modality limits in vision language example (vllm-project#17721)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* Make right sidebar more readable in "Supported Models" (vllm-project#17723)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [TPU] Increase block size and reset block shapes (vllm-project#16458)

* [Misc] Add Next Edit Prediction (NEP) datasets support in `benchmark_serving.py` (vllm-project#16839)

Signed-off-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal>
Signed-off-by: dtransposed <>
Co-authored-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal>

* [Bugfix] Fix for the condition to accept empty encoder inputs for mllama (vllm-project#17732)

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

* [Kernel] Unified Triton kernel that doesn't distinguish between prefill + decode (vllm-project#16828)

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>

---------

Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Signed-off-by: rzou <zou3519@gmail.com>
Signed-off-by: Mikhail Podvitskii <podvitskiymichael@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: jiang.li <jiang1.li@intel.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
Signed-off-by: reidliu41 <reid201711@gmail.com>
Signed-off-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal>
Signed-off-by: dtransposed <>
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Co-authored-by: Stan Wozniak <77159600+s3woz@users.noreply.github.com>
Co-authored-by: Thomas Ortner <boh@zurich.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Richard Zou <zou3519@users.noreply.github.com>
Co-authored-by: Mikhail Podvitskii <podvitskiymichael@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Michael Yao <haifeng.yao@daocloud.io>
Co-authored-by: Reid <61492567+reidliu41@users.noreply.github.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: Jevin Jiang <jevin0change@gmail.com>
Co-authored-by: d.transposed <damian.bogunowicz@gmail.com>
Co-authored-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal>
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>
Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
…serving.py` (vllm-project#16839)

Signed-off-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal>
Signed-off-by: dtransposed <>
Co-authored-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal>
Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
mawong-amd pushed a commit to ROCm/vllm that referenced this pull request May 14, 2025
…serving.py` (vllm-project#16839)

Signed-off-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal>
Signed-off-by: dtransposed <>
Co-authored-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal>
zzzyq pushed a commit to zzzyq/vllm that referenced this pull request May 24, 2025
…serving.py` (vllm-project#16839)

Signed-off-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal>
Signed-off-by: dtransposed <>
Co-authored-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal>
Signed-off-by: Yuqi Zhang <yuqizhang@google.com>
@ekagra-ranjan
Copy link
Contributor

ekagra-ranjan commented May 27, 2025

Hi folks! @mgoin @NickLucche - I noticed that we have benchmark_dataset.py and datasets.py. What is the difference bw these 2? I am trying to add a dataset so do I need to add to one of them or both of them?

Got the answer from this: #16839 (review)

minpeter pushed a commit to minpeter/vllm that referenced this pull request Jun 24, 2025
…serving.py` (vllm-project#16839)

Signed-off-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal>
Signed-off-by: dtransposed <>
Co-authored-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal>
Signed-off-by: minpeter <kali2005611@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build documentation Improvements or additions to documentation frontend multi-modality Related to multi-modality (#4194) ready ONLY add when PR is ready to merge/full CI is needed speculative-decoding v1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants