fix(http-client): tcp warmup count by viraatc · Pull Request #153 · mlcommons/endpoints

viraatc · 2026-03-04T22:16:14Z

What does this PR do?

Type of change

Bug fix
New feature
Documentation update
Refactor/cleanup

Related issues

Testing

Tests added/updated
All tests pass locally
Manual testing completed

Checklist

Code follows project style
Pre-commit hooks pass
Documentation updated (if needed)

github-actions · 2026-03-04T22:16:25Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

gemini-code-assist · 2026-03-04T22:16:46Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the warmup_connections parameter, transitioning it from a simple boolean flag to a more flexible integer-based configuration. This change allows for precise control over TCP connection warming, enabling automatic, disabled, or explicitly counted warm-up connections. The update impacts how client connections are managed during benchmarking and probing, providing enhanced configurability and ensuring consistent behavior across the system.

Highlights

Refactored warmup_connections parameter: The warmup_connections parameter has been changed from a boolean to an integer type, providing more granular control over TCP connection warming.
New warmup_connections values: Introduced new integer values for warmup_connections: -1 for automatic warming (50% of the connection pool), 0 to disable warming, and >0 for an explicit count of connections to warm up.
Updated client and command configurations: Client settings, benchmark commands, and probe commands have been updated to utilize the new integer-based warmup_connections parameter, ensuring consistent behavior across the system.
ZMQ Context Management: Integrated ManagedZMQContext for proper lifecycle management within the benchmark utility, ensuring ZMQ resources are correctly handled.
Integration Test Migration: Numerous integration tests have been updated to reflect the change in warmup_connections parameter type and its new integer values.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

src/inference_endpoint/commands/benchmark.py
- Updated warmup_connections default value from True to -1 in client settings.
src/inference_endpoint/commands/probe.py
- Changed warmup_connections value from False to 0 for probe client configuration.
src/inference_endpoint/config/schema.py
- Modified warmup_connections type from bool to int.
- Set default warmup_connections to -1 (auto).
- Added detailed comments explaining the new integer values for warmup_connections.
src/inference_endpoint/utils/benchmark_httpclient.py
- Imported ManagedZMQContext.
- Added zmq_context parameter to _create_client function.
- Updated warmup_connections to 0 in client configuration within _create_client.
- Passed zmq_context to AsyncHttpEndpointClient constructor.
- Implemented ManagedZMQContext for lifecycle management in run_benchmark.
tests/integration/commands/test_benchmark_command.py
- Updated warmup_connections from False to 0 in all benchmark command test cases.
tests/integration/endpoint_client/conftest.py
- Changed warmup_connections parameter type from bool to int and default from False to 0 in create_futures_client.
- Updated docstring for warmup_connections to reflect new integer values.
tests/integration/endpoint_client/test_external_serving.py
- Updated warmup_connections from False to 0 in client configuration.
tests/integration/endpoint_client/test_http_client.py
- Updated warmup_connections from False to 0 in multiple test client configurations.
tests/integration/endpoint_client/test_sglang_adapter.py
- Updated warmup_connections from False to 0 in SGLang client configuration.
tests/integration/endpoint_client/test_worker.py
- Updated warmup_connections from False to 0 in various worker configurations.
tests/integration/endpoint_client/test_worker_manager.py
- Updated warmup_connections from False to 0 in worker manager configurations.
tests/integration/test_end_to_end_oracle.py
- Updated warmup_connections from False to 0 in DeepSeekR1SampleIssuer HTTP client configuration.
tests/performance/conftest.py
- Updated warmup_connections from False to 0 in HTTP client configuration.

Activity

No specific activity (comments, reviews, progress) has been recorded for this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request refactors the warmup_connections setting from a boolean to an integer, providing more granular control over connection warming. However, a critical security issue was identified in src/inference_endpoint/utils/benchmark_httpclient.py where manual ZMQ context management could lead to resource leaks and potential denial of service if exceptions occur during benchmark execution. This manual context manager handling is not robust; using a with statement is recommended for proper resource management. Otherwise, the changes are consistently applied and improve the flexibility of the HTTP client configuration.

src/inference_endpoint/utils/benchmark_httpclient.py

arekay-nv

Awesome, thanks!

Copilot

Pull request overview

This PR changes the warmup_connections configuration field from bool to int across the codebase, enabling more granular control over TCP connection warmup. The old boolean semantics (True/False) are replaced with integer semantics where -1 means auto (50% of pool), 0 means disabled, and >0 specifies an explicit total count. Additionally, the benchmark_httpclient.py utility is updated to properly manage a ZMQ context and pass it through to the AsyncHttpEndpointClient.

Changes:

Changed warmup_connections type from bool to int in both config schemas (config.py, schema.py) and updated all call sites (tests and production code) from False to 0 and True to -1.
Added ZMQ context management to run_benchmark() in benchmark_httpclient.py, creating a ManagedZMQContext and passing it through _create_client to AsyncHttpEndpointClient.
Updated the create_futures_client test helper signature and docstring to reflect the new int type.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`src/inference_endpoint/endpoint_client/config.py`	Changed `warmup_connections` type from `bool` to `int` with `-1` default
`src/inference_endpoint/config/schema.py`	Updated `ClientSettings.warmup_connections` from `bool` to `int` with documentation
`src/inference_endpoint/commands/benchmark.py`	Updated default from `True` to `-1` in CLI config builder
`src/inference_endpoint/commands/probe.py`	Changed `warmup_connections=False` to `warmup_connections=0`
`src/inference_endpoint/utils/benchmark_httpclient.py`	Added ZMQ context management and `zmq_context` parameter; changed warmup to `0`
`tests/integration/endpoint_client/conftest.py`	Updated helper function type annotation and docstring
`tests/integration/endpoint_client/test_worker.py`	Changed `False` to `0` in 4 locations
`tests/integration/endpoint_client/test_worker_manager.py`	Changed `False` to `0` in 2 locations
`tests/integration/endpoint_client/test_http_client.py`	Changed `False` to `0` in 2 locations
`tests/integration/endpoint_client/test_sglang_adapter.py`	Changed `False` to `0`
`tests/integration/endpoint_client/test_external_serving.py`	Changed `False` to `0`
`tests/integration/commands/test_benchmark_command.py`	Changed `False` to `0` in 4 locations
`tests/integration/test_end_to_end_oracle.py`	Changed `False` to `0`
`tests/performance/conftest.py`	Changed `False` to `0`

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/inference_endpoint/utils/benchmark_httpclient.py

fix tcp warmup

a0245c6

viraatc requested a review from a team as a code owner March 4, 2026 22:16

Copilot AI review requested due to automatic review settings March 4, 2026 22:16

github-actions bot requested review from arekay-nv and nvzhihanj March 4, 2026 22:16

viraatc changed the title ~~fix tcp warmup~~ fix(http-client): tcp warmup count Mar 4, 2026

Copilot started reviewing on behalf of viraatc March 4, 2026 22:16 View session

gemini-code-assist bot reviewed Mar 4, 2026

View reviewed changes

src/inference_endpoint/utils/benchmark_httpclient.py Show resolved Hide resolved

arekay-nv approved these changes Mar 4, 2026

View reviewed changes

Copilot AI reviewed Mar 4, 2026

View reviewed changes

src/inference_endpoint/utils/benchmark_httpclient.py Show resolved Hide resolved

nvzhihanj approved these changes Mar 4, 2026

View reviewed changes

nvzhihanj merged commit e27ff4f into main Mar 4, 2026
9 checks passed

github-actions bot locked and limited conversation to collaborators Mar 4, 2026

viraatc deleted the fix/viraatc-warmup-conn branch April 1, 2026 22:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(http-client): tcp warmup count#153

fix(http-client): tcp warmup count#153
nvzhihanj merged 1 commit intomainfrom
fix/viraatc-warmup-conn

viraatc commented Mar 4, 2026

Uh oh!

github-actions bot commented Mar 4, 2026

Uh oh!

gemini-code-assist bot commented Mar 4, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

arekay-nv left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

viraatc commented Mar 4, 2026

What does this PR do?

Type of change

Related issues

Testing

Checklist

Uh oh!

github-actions bot commented Mar 4, 2026

Uh oh!

gemini-code-assist bot commented Mar 4, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

arekay-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants