-
-
Notifications
You must be signed in to change notification settings - Fork 8.4k
[Build] Add Benchmark dependencies to requirements/common.txt #19145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @simon-mo, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
Hello! Gemini here, providing a summary of this pull request authored by simon-mo. The primary goal of this PR is to prepare the project for benchmarking by adding necessary dependencies. Specifically, it adds pandas
and datasets
to the common requirements file, requirements/common.txt
. This change is intended to ensure that the environment has the required libraries when running benchmark-related code.
Highlights
- Add Benchmark Dependencies: This pull request introduces
pandas
anddatasets
as dependencies by adding them to therequirements/common.txt
file. These libraries are specified as being needed for the benchmarks module.
Changelog
- requirements/common.txt
- Added
pandas
to the list of common requirements (line 51). - Added
datasets
to the list of common requirements (line 52). - Both dependencies are noted as being required for the benchmarks module.
- Added
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Deps are added now,
For benchmarks, they are needed,
Code can run faster.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds pandas
and datasets
as dependencies to support the benchmark module, which is a useful addition. The changes are straightforward, but there are a couple of points regarding dependency management that could be addressed to improve stability and maintainability. Specifically, considering version pinning for these new dependencies and evaluating if they should be optional dependencies rather than part of common.txt
.
Summary of Findings
- Dependency Version Pinning: The newly added dependencies
pandas
anddatasets
inrequirements/common.txt
are not version-pinned. This could lead to instability or build issues if future versions introduce breaking changes. It's recommended to specify versions (e.g.,pandas==X.Y.Z
orpandas>=X.Y.Z
). - Placement of Benchmark-Specific Dependencies:
pandas
anddatasets
are noted as being for the 'benchmarks module'. Including them incommon.txt
makes them mandatory for all users. Consider making them optional dependencies (e.g., viaextras_require
insetup.py
or a separaterequirements-benchmark.txt
) to keep the core installation lean.
Merge Readiness
The pull request aims to add necessary dependencies for benchmark support, which is valuable. However, there are a couple of concerns regarding dependency management:
- Version Pinning: The new dependencies (
pandas
,datasets
) should have their versions specified to ensure stability and reproducibility. - Optional Dependencies: Since these dependencies appear to be solely for benchmarks, they might be better suited as optional dependencies rather than being added to
common.txt
, which affects all users.
I recommend addressing these points before merging to maintain code quality and a lean core dependency set. As an AI reviewer, I am not authorized to approve pull requests; please ensure further review and approval from project maintainers after addressing the feedback.
pandas # needed for benchmarks module | ||
datasets # needed for benchmarks module |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency with other packages in this file (e.g., opentelemetry-sdk>=1.26.0
, transformers >= 4.51.1
) and to ensure reproducible builds and prevent unexpected breakages from upstream changes, would it be better to pin versions for pandas
and datasets
?
For example, you could specify the versions you tested with (e.g., pandas==X.Y.Z
) or a minimum compatible version (e.g., pandas>=X.Y.Z
). This helps maintain stability over time.
pandas==<version_tested_with> # needed for benchmarks module
datasets==<version_tested_with> # needed for benchmarks module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both are relatively stable and very widely used thus I prefer not to pin it to avoid conflicts.
pandas # needed for benchmarks module | ||
datasets # needed for benchmarks module |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that pandas
and datasets
are specifically for the 'benchmarks module' (as per the comments and their usage with PlaceholderModule
in vllm/benchmarks/datasets.py
), should these be included in common.txt
? Adding them here makes them mandatory for all installations of vLLM, even for users who don't run benchmarks.
Could these be moved to an optional dependency group (e.g., defined in setup.py
under extras_require={'benchmark': ['pandas', 'datasets']}
so users can install with pip install vllm[benchmark]
) or a separate requirements file like requirements/benchmark.txt
? This would help keep the core dependency set lighter for general users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But they are imported in CLI on critical path rn.
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Summary
https://chatgpt.com/codex/tasks/task_e_68404dbc92f4832985a94abfa89e8fe9