Skip to content

[Build] Add Benchmark dependencies to requirements/common.txt #19145

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

simon-mo
Copy link
Collaborator

@simon-mo simon-mo commented Jun 4, 2025

Summary

  • add pandas and datasets to common requirements for benchmark support otherwise a clean install will fail

https://chatgpt.com/codex/tasks/task_e_68404dbc92f4832985a94abfa89e8fe9

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @simon-mo, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello! Gemini here, providing a summary of this pull request authored by simon-mo. The primary goal of this PR is to prepare the project for benchmarking by adding necessary dependencies. Specifically, it adds pandas and datasets to the common requirements file, requirements/common.txt. This change is intended to ensure that the environment has the required libraries when running benchmark-related code.

Highlights

  • Add Benchmark Dependencies: This pull request introduces pandas and datasets as dependencies by adding them to the requirements/common.txt file. These libraries are specified as being needed for the benchmarks module.

Changelog

  • requirements/common.txt
    • Added pandas to the list of common requirements (line 51).
    • Added datasets to the list of common requirements (line 52).
    • Both dependencies are noted as being required for the benchmarks module.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


Deps are added now,
For benchmarks, they are needed,
Code can run faster.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@mergify mergify bot added the ci/build label Jun 4, 2025
@simon-mo simon-mo changed the title Add benchmark deps [Build] Add Benchmark dependencies to requirements/common.txt Jun 4, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds pandas and datasets as dependencies to support the benchmark module, which is a useful addition. The changes are straightforward, but there are a couple of points regarding dependency management that could be addressed to improve stability and maintainability. Specifically, considering version pinning for these new dependencies and evaluating if they should be optional dependencies rather than part of common.txt.

Summary of Findings

  • Dependency Version Pinning: The newly added dependencies pandas and datasets in requirements/common.txt are not version-pinned. This could lead to instability or build issues if future versions introduce breaking changes. It's recommended to specify versions (e.g., pandas==X.Y.Z or pandas>=X.Y.Z).
  • Placement of Benchmark-Specific Dependencies: pandas and datasets are noted as being for the 'benchmarks module'. Including them in common.txt makes them mandatory for all users. Consider making them optional dependencies (e.g., via extras_require in setup.py or a separate requirements-benchmark.txt) to keep the core installation lean.

Merge Readiness

The pull request aims to add necessary dependencies for benchmark support, which is valuable. However, there are a couple of concerns regarding dependency management:

  1. Version Pinning: The new dependencies (pandas, datasets) should have their versions specified to ensure stability and reproducibility.
  2. Optional Dependencies: Since these dependencies appear to be solely for benchmarks, they might be better suited as optional dependencies rather than being added to common.txt, which affects all users.

I recommend addressing these points before merging to maintain code quality and a lean core dependency set. As an AI reviewer, I am not authorized to approve pull requests; please ensure further review and approval from project maintainers after addressing the feedback.

Comment on lines +51 to +52
pandas # needed for benchmarks module
datasets # needed for benchmarks module
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

For consistency with other packages in this file (e.g., opentelemetry-sdk>=1.26.0, transformers >= 4.51.1) and to ensure reproducible builds and prevent unexpected breakages from upstream changes, would it be better to pin versions for pandas and datasets?

For example, you could specify the versions you tested with (e.g., pandas==X.Y.Z) or a minimum compatible version (e.g., pandas>=X.Y.Z). This helps maintain stability over time.

pandas==<version_tested_with>  # needed for benchmarks module
datasets==<version_tested_with>  # needed for benchmarks module

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both are relatively stable and very widely used thus I prefer not to pin it to avoid conflicts.

Comment on lines +51 to +52
pandas # needed for benchmarks module
datasets # needed for benchmarks module
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Given that pandas and datasets are specifically for the 'benchmarks module' (as per the comments and their usage with PlaceholderModule in vllm/benchmarks/datasets.py), should these be included in common.txt? Adding them here makes them mandatory for all installations of vLLM, even for users who don't run benchmarks.

Could these be moved to an optional dependency group (e.g., defined in setup.py under extras_require={'benchmark': ['pandas', 'datasets']} so users can install with pip install vllm[benchmark]) or a separate requirements file like requirements/benchmark.txt? This would help keep the core dependency set lighter for general users.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But they are imported in CLI on critical path rn.

@simon-mo simon-mo requested a review from mgoin June 4, 2025 13:53
Copy link

github-actions bot commented Jun 4, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mgoin
Copy link
Member

mgoin commented Jun 4, 2025

@simon-mo I would prefer to keep these dataset deps separate and optional as achieved in this PR by lazy importing them #19089
The breaking change on main was not intentional.

@simon-mo simon-mo closed this Jun 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants