Skip to content

feat: add perf utils#122

Merged
viraatc merged 6 commits intomainfrom
feat/viraatc-perf-utils
Feb 14, 2026
Merged

feat: add perf utils#122
viraatc merged 6 commits intomainfrom
feat/viraatc-perf-utils

Conversation

@viraatc
Copy link
Copy Markdown
Collaborator

@viraatc viraatc commented Feb 10, 2026

What does this PR do?

addresses: #9

HTTP client performance testing utility.

Benchmarks send/recv rate of the HTTPEndpointClient using uvloop.
Can auto-launch a MaxThroughputServer or connect to an external endpoint.

Usage (see all available args in --help):
    python -m inference_endpoint.utils.benchmark_httpclient -w 8 -c 512 -d 20
    python -m inference_endpoint.utils.benchmark_httpclient --endpoint http://host:8080/v1/chat/completions
    python -m inference_endpoint.utils.benchmark_httpclient --no-pin --track-memory

Sweep modes (-w, -c, -l accept ranges; endpoints always included):
    -w 4:12           every int in [4, 12]
    -c 100:500:100    start:stop:step  -> [100, 200, 300, 400, 500]
    -w 1:32::12       start:stop::N    -> 12 evenly-spaced points in [1, 32]
    -l 32,128,512     explicit values
    -w 1:32::12 -c 100:500::4          cartesian product sweep
    --full                             preset sweep of common worker counts x prompt lengths (non-streaming)
    --full --stream                    preset sweep of common worker counts x prompt lengths (streaming)

example:

$ -> python scripts/benchmark_httpclient.py --duration 10 --full
================================================================================================
Sweep Summary: num_workers, prompt_length
================================================================================================
   num_workers |  prompt_length |    Send Rate |    Recv Rate | Outstanding |  Stall% |   Errors
------------------------------------------------------------------------------------------------
             1 |              1 |       80,703 |       49,278 |           0 |   62.3% |     0.0%
             1 |             32 |       80,038 |       48,142 |           0 |   61.5% |     0.0%
             1 |            128 |       79,333 |       48,868 |           0 |   62.0% |     0.0%
...
            16 |         16,384 |      103,547 |      103,541 |           0 |    0.0% |     0.0%
            16 |         32,768 |       73,652 |       73,636 |           0 |    0.0% |     0.0%
            16 |         65,536 |       26,006 |       26,006 |           0 |    0.0% |     0.0%
            16 |        131,072 |       19,434 |       19,433 |           0 |    0.0% |     0.0%


Plot saved to: /tmp/sweep_num_workers_1-16_x_prompt_length_1-131072_duration=3.0_max_in_flight=100000_pin=Tr
ue.png
Screenshot 2026-02-10 at 15 30 53
$ -> python scripts/benchmark_httpclient.py -w 4:98:4 --stream --server-workers 16
...
==============================================================================================
Sweep Summary: num_workers
==============================================================================================
   num_workers |    Send Rate |    Recv Rate |   SSE-pkts/s | Outstanding |  Stall% |   Errors
----------------------------------------------------------------------------------------------
             4 |       14,787 |        4,805 |    4,814,196 |           0 |   86.2% |     0.0%
             8 |       19,579 |        9,480 |    9,499,254 |           0 |   81.9% |     0.0%
            12 |       24,152 |       13,855 |   13,882,346 |           0 |   78.9% |     0.0%
            16 |       27,946 |       17,755 |   17,790,948 |           0 |   73.8% |     0.0%
            20 |       31,415 |       21,201 |   21,243,302 |           0 |   67.9% |     0.0%
            24 |       33,869 |       23,596 |   23,643,404 |           0 |   61.6% |     0.0%
            28 |       36,177 |       26,064 |   26,116,147 |           0 |   56.2% |     0.0%
            32 |       38,805 |       28,520 |   28,576,789 |           0 |   48.8% |     0.0%
            36 |       41,288 |       30,458 |   30,518,869 |           0 |   42.9% |     0.0%
            40 |       43,885 |       32,100 |   32,164,009 |           0 |   36.0% |     0.0%
            44 |       45,934 |       35,565 |   35,635,725 |           0 |   27.0% |     0.0%
            48 |       46,878 |       36,733 |   36,806,714 |           0 |   22.8% |     0.0%
            52 |       47,495 |       35,641 |   35,711,910 |           0 |   19.9% |     0.0%
            56 |       49,991 |       37,925 |   38,001,183 |           0 |   14.1% |     0.0%
            60 |       53,353 |       40,325 |   40,405,908 |           0 |    9.0% |     0.0%
            64 |       56,786 |       43,011 |   43,097,440 |           0 |    5.6% |     0.0%
            68 |       60,130 |       45,769 |   45,860,070 |           0 |    2.4% |     0.0%
            72 |       63,623 |       49,915 |   50,015,115 |           0 |    0.5% |     0.0%
            76 |       61,537 |       51,891 |   51,994,283 |           0 |    0.0% |     0.0%
            80 |       59,533 |       54,540 |   54,648,673 |           0 |    0.0% |     0.0%
            84 |       60,139 |       55,623 |   55,734,724 |           0 |    0.0% |     0.0%
            88 |       60,035 |       59,991 |   60,111,393 |           0 |    0.0% |     0.0%
            92 |       61,096 |       61,075 |   61,196,969 |           0 |    0.0% |     0.0%
            96 |       61,690 |       61,674 |   61,797,769 |           0 |    0.0% |     0.0%
           100 |       62,063 |       62,055 |   62,179,067 |           0 |    0.0% |     0.0%
           104 |       62,567 |       62,559 |   62,683,778 |           0 |    0.0% |     0.0%
           108 |       63,729 |       63,720 |   63,847,133 |           0 |    0.0% |     0.0%
           112 |       62,865 |       50,857 |   50,959,197 |           0 |    0.0% |     0.0%
           116 |       63,247 |       50,351 |   50,451,365 |           0 |    0.0% |     0.0%
           120 |       63,367 |       52,499 |   52,603,988 |           0 |    0.0% |     0.0%
           124 |       62,059 |       52,932 |   53,037,533 |           0 |    0.0% |     0.0%
           128 |       61,529 |       51,434 |   51,536,902 |           0 |    0.0% |     0.0%
==============================================================================================

Plot saved to: /tmp/sweep_num_workers_4-128_duration=10.0_max_concurrency=100000_streaming=True.png
Screenshot 2026-02-13 at 14 44 39

Type of change

  • Bug fix
  • New feature
  • Documentation update
  • Refactor/cleanup

Related issues

Testing

  • Tests added/updated
  • All tests pass locally
  • Manual testing completed

Checklist

  • Code follows project style
  • Pre-commit hooks pass
  • Documentation updated (if needed)

@viraatc viraatc requested a review from a team as a code owner February 10, 2026 08:38
Copilot AI review requested due to automatic review settings February 10, 2026 08:38
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @viraatc, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a robust set of performance utilities designed to thoroughly benchmark the HTTP client. It provides a dedicated script for running single performance tests or complex parameter sweeps, complete with live statistics, memory tracking, and CPU affinity controls. A new mock server is also included, allowing for isolated and high-throughput client-side performance measurements, with results automatically visualized through generated plots.

Highlights

  • New Performance Testing Utility: Introduced a new Python script (scripts/benchmark_httpclient.py) for comprehensive performance testing of the HTTP client, supporting both single runs and advanced parameter sweeps.
  • Max Throughput Mock Server: Developed a MaxThroughputServer (src/inference_endpoint/testing/max_throughput_server.py) to provide a high-throughput, low-latency mock OpenAI-compatible API server, enabling isolated client performance benchmarking.
  • Advanced Benchmarking Features: Implemented features within the benchmark utility for live statistics display, memory usage tracking, CPU affinity pinning for workers, and automatic plotting of sweep results using matplotlib.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • pyproject.toml
    • Added matplotlib>=3.8.0 to the performance dependency group.
  • scripts/benchmark_httpclient.py
    • Added a new script to benchmark the HTTP client's send/receive rates.
    • Implemented argument parsing for various benchmark configurations, including endpoint, duration, worker counts, connections, prompt length, streaming, and memory tracking.
    • Included logic for single benchmark runs and parameter sweeps (cartesian product).
    • Developed LiveDisplay for real-time statistics and memory usage monitoring.
    • Integrated matplotlib for generating and saving plots of sweep results.
    • Added CPU affinity pinning support for workers.
    • Configured uvloop for high-performance I/O.
  • src/inference_endpoint/testing/max_throughput_server.py
    • Added a new module defining MaxThroughputServer, a minimal OpenAI-compatible LLM API server.
    • Implemented build_streaming_response and build_non_streaming_response to generate fixed, pre-compiled HTTP responses.
    • Included RequestParser and ServerProtocol for efficient request handling.
    • Designed the server to run with multiple worker processes using multiprocessing for high throughput.
    • Provided options for live statistics display and auto-assigned port binding.
Activity
  • No human activity (comments, reviews, progress updates) was provided in the context for this pull request.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a high-throughput stub server and a benchmarking script to measure HTTP client send/recv throughput (including sweep + plotting), plus a dependency update to support plotting.

Changes:

  • Introduces MaxThroughputServer: a minimal OpenAI-compatible server returning pre-built responses for roofline-style client benchmarking.
  • Adds scripts/benchmark_httpclient.py with single-run + sweep modes, live stats, optional memory tracking, and plot generation.
  • Adds matplotlib to dependencies to support sweep plotting.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 10 comments.

File Description
src/inference_endpoint/testing/max_throughput_server.py New minimal high-throughput HTTP stub server for isolating client throughput.
scripts/benchmark_httpclient.py New benchmark utility with sweep modes, live stats, restartable local server, and plotting.
pyproject.toml Adds matplotlib dependency (currently under test extras) for plot output.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces a new performance testing utility for HTTP clients, along with a mock maximum throughput server. The utility supports single runs and parameter sweeps, including CPU affinity pinning and memory tracking. It also adds matplotlib as a dependency for plotting sweep results. The overall structure and functionality appear sound, providing a comprehensive tool for benchmarking. There are a couple of areas where maintainability and efficiency could be improved.

Copilot AI review requested due to automatic review settings February 10, 2026 08:51
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings February 10, 2026 08:56
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings February 10, 2026 09:08
@viraatc viraatc force-pushed the feat/viraatc-perf-utils branch from 88dc43d to d451922 Compare February 10, 2026 09:10
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Feb 10, 2026

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

Copy link
Copy Markdown
Collaborator

@arekay-nv arekay-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this - definitely useful. Would love to try it out.

Copilot AI review requested due to automatic review settings February 10, 2026 22:13
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings February 10, 2026 23:04
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

src/inference_endpoint/testing/max_throughput_server.py:1

  • The _restart_server function accesses private attributes of the MaxThroughputServer class, creating tight coupling. Consider adding a public restart or reconfigure method to the server class instead.
# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings February 10, 2026 23:11
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@viraatc viraatc force-pushed the feat/viraatc-perf-utils branch from e77e711 to 8bbb5e6 Compare February 10, 2026 23:36
Copilot AI review requested due to automatic review settings February 10, 2026 23:37
@viraatc viraatc force-pushed the feat/viraatc-perf-utils branch from 8bbb5e6 to db397d0 Compare February 10, 2026 23:37
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@viraatc viraatc force-pushed the feat/viraatc-perf-utils branch from db397d0 to 2cf3680 Compare February 11, 2026 00:05
Copilot AI review requested due to automatic review settings February 11, 2026 00:17
@viraatc viraatc force-pushed the feat/viraatc-perf-utils branch from 2cf3680 to e15536b Compare February 11, 2026 00:17
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Collaborator

@arekay-nv arekay-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. Thanks!

Copilot AI review requested due to automatic review settings February 13, 2026 00:47
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings February 14, 2026 00:06
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@viraatc viraatc force-pushed the feat/viraatc-perf-utils branch from 65efeb2 to 3465924 Compare February 14, 2026 00:13
Copilot AI review requested due to automatic review settings February 14, 2026 00:21
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@viraatc viraatc merged commit 24272d0 into main Feb 14, 2026
10 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Feb 14, 2026
@viraatc viraatc deleted the feat/viraatc-perf-utils branch February 14, 2026 00:51
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants