Skip to content

fix(http-client): tcp warmup count#153

Merged
nvzhihanj merged 1 commit intomainfrom
fix/viraatc-warmup-conn
Mar 4, 2026
Merged

fix(http-client): tcp warmup count#153
nvzhihanj merged 1 commit intomainfrom
fix/viraatc-warmup-conn

Conversation

@viraatc
Copy link
Copy Markdown
Collaborator

@viraatc viraatc commented Mar 4, 2026

What does this PR do?

Type of change

  • Bug fix
  • New feature
  • Documentation update
  • Refactor/cleanup

Related issues

Testing

  • Tests added/updated
  • All tests pass locally
  • Manual testing completed

Checklist

  • Code follows project style
  • Pre-commit hooks pass
  • Documentation updated (if needed)

@viraatc viraatc requested a review from a team as a code owner March 4, 2026 22:16
Copilot AI review requested due to automatic review settings March 4, 2026 22:16
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 4, 2026

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@github-actions github-actions bot requested review from arekay-nv and nvzhihanj March 4, 2026 22:16
@viraatc viraatc changed the title fix tcp warmup fix(http-client): tcp warmup count Mar 4, 2026
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the warmup_connections parameter, transitioning it from a simple boolean flag to a more flexible integer-based configuration. This change allows for precise control over TCP connection warming, enabling automatic, disabled, or explicitly counted warm-up connections. The update impacts how client connections are managed during benchmarking and probing, providing enhanced configurability and ensuring consistent behavior across the system.

Highlights

  • Refactored warmup_connections parameter: The warmup_connections parameter has been changed from a boolean to an integer type, providing more granular control over TCP connection warming.
  • New warmup_connections values: Introduced new integer values for warmup_connections: -1 for automatic warming (50% of the connection pool), 0 to disable warming, and >0 for an explicit count of connections to warm up.
  • Updated client and command configurations: Client settings, benchmark commands, and probe commands have been updated to utilize the new integer-based warmup_connections parameter, ensuring consistent behavior across the system.
  • ZMQ Context Management: Integrated ManagedZMQContext for proper lifecycle management within the benchmark utility, ensuring ZMQ resources are correctly handled.
  • Integration Test Migration: Numerous integration tests have been updated to reflect the change in warmup_connections parameter type and its new integer values.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • src/inference_endpoint/commands/benchmark.py
    • Updated warmup_connections default value from True to -1 in client settings.
  • src/inference_endpoint/commands/probe.py
    • Changed warmup_connections value from False to 0 for probe client configuration.
  • src/inference_endpoint/config/schema.py
    • Modified warmup_connections type from bool to int.
    • Set default warmup_connections to -1 (auto).
    • Added detailed comments explaining the new integer values for warmup_connections.
  • src/inference_endpoint/utils/benchmark_httpclient.py
    • Imported ManagedZMQContext.
    • Added zmq_context parameter to _create_client function.
    • Updated warmup_connections to 0 in client configuration within _create_client.
    • Passed zmq_context to AsyncHttpEndpointClient constructor.
    • Implemented ManagedZMQContext for lifecycle management in run_benchmark.
  • tests/integration/commands/test_benchmark_command.py
    • Updated warmup_connections from False to 0 in all benchmark command test cases.
  • tests/integration/endpoint_client/conftest.py
    • Changed warmup_connections parameter type from bool to int and default from False to 0 in create_futures_client.
    • Updated docstring for warmup_connections to reflect new integer values.
  • tests/integration/endpoint_client/test_external_serving.py
    • Updated warmup_connections from False to 0 in client configuration.
  • tests/integration/endpoint_client/test_http_client.py
    • Updated warmup_connections from False to 0 in multiple test client configurations.
  • tests/integration/endpoint_client/test_sglang_adapter.py
    • Updated warmup_connections from False to 0 in SGLang client configuration.
  • tests/integration/endpoint_client/test_worker.py
    • Updated warmup_connections from False to 0 in various worker configurations.
  • tests/integration/endpoint_client/test_worker_manager.py
    • Updated warmup_connections from False to 0 in worker manager configurations.
  • tests/integration/test_end_to_end_oracle.py
    • Updated warmup_connections from False to 0 in DeepSeekR1SampleIssuer HTTP client configuration.
  • tests/performance/conftest.py
    • Updated warmup_connections from False to 0 in HTTP client configuration.
Activity
  • No specific activity (comments, reviews, progress) has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the warmup_connections setting from a boolean to an integer, providing more granular control over connection warming. However, a critical security issue was identified in src/inference_endpoint/utils/benchmark_httpclient.py where manual ZMQ context management could lead to resource leaks and potential denial of service if exceptions occur during benchmark execution. This manual context manager handling is not robust; using a with statement is recommended for proper resource management. Otherwise, the changes are consistently applied and improve the flexibility of the HTTP client configuration.

Copy link
Copy Markdown
Collaborator

@arekay-nv arekay-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks!

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR changes the warmup_connections configuration field from bool to int across the codebase, enabling more granular control over TCP connection warmup. The old boolean semantics (True/False) are replaced with integer semantics where -1 means auto (50% of pool), 0 means disabled, and >0 specifies an explicit total count. Additionally, the benchmark_httpclient.py utility is updated to properly manage a ZMQ context and pass it through to the AsyncHttpEndpointClient.

Changes:

  • Changed warmup_connections type from bool to int in both config schemas (config.py, schema.py) and updated all call sites (tests and production code) from False to 0 and True to -1.
  • Added ZMQ context management to run_benchmark() in benchmark_httpclient.py, creating a ManagedZMQContext and passing it through _create_client to AsyncHttpEndpointClient.
  • Updated the create_futures_client test helper signature and docstring to reflect the new int type.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/inference_endpoint/endpoint_client/config.py Changed warmup_connections type from bool to int with -1 default
src/inference_endpoint/config/schema.py Updated ClientSettings.warmup_connections from bool to int with documentation
src/inference_endpoint/commands/benchmark.py Updated default from True to -1 in CLI config builder
src/inference_endpoint/commands/probe.py Changed warmup_connections=False to warmup_connections=0
src/inference_endpoint/utils/benchmark_httpclient.py Added ZMQ context management and zmq_context parameter; changed warmup to 0
tests/integration/endpoint_client/conftest.py Updated helper function type annotation and docstring
tests/integration/endpoint_client/test_worker.py Changed False to 0 in 4 locations
tests/integration/endpoint_client/test_worker_manager.py Changed False to 0 in 2 locations
tests/integration/endpoint_client/test_http_client.py Changed False to 0 in 2 locations
tests/integration/endpoint_client/test_sglang_adapter.py Changed False to 0
tests/integration/endpoint_client/test_external_serving.py Changed False to 0
tests/integration/commands/test_benchmark_command.py Changed False to 0 in 4 locations
tests/integration/test_end_to_end_oracle.py Changed False to 0
tests/performance/conftest.py Changed False to 0

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@nvzhihanj nvzhihanj merged commit e27ff4f into main Mar 4, 2026
9 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Mar 4, 2026
@viraatc viraatc deleted the fix/viraatc-warmup-conn branch April 1, 2026 22:58
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants