Rework Backend to Native HTTP Requests and Enhance API Compatibility & Performance #91

markurtz · 2025-03-06T13:49:17Z

Summary

This PR restructures the backend of guidellm to utilize native HTTP requests with httpx and HTTP/2 standards, significantly improving performance and scalability. It aligns the backend interface closely with the OpenAI API specifications, enabling smoother integration paths for future expansions, including multi-backend support.

Details

Replaced legacy backend logic with native HTTP request handling using httpx, enabling HTTP/2 support for optimized performance.
Established a clear backend interface adhering strictly to OpenAI's API standards, simplifying future compatibility and extension.
Cleaned up backend file structure for clarity and maintainability:
- Removed legacy files: base.py, load_generator.py.
- Added new modular backend components: backend.py, scheduler/backend_worker.py, and structured response handling in response.py.
Optimized request-response flow, reducing overhead and latency.
Improved configuration management (config.py) to better support environment-based settings.

Testing

Comprehensive new tests added for backend modules and response handling.
All existing unit tests updated and verified to pass without regressions.
Manual testing performed for key API flows, verifying accuracy, stability, and significant performance improvements.

Related Issues

…hey become serializable

…values

…hey become serializable

…http/2 support. Additionally, add in support for multi modal requests (needs further enablement in the rest of the system for future TODO). Still needs testing and test fixes

…/ in place

sjmonson

Sorry this took so long to review. My testing showed that there was a big regression in the time between each request for all scenarios. But I believe its just due to the start and stop timing being moved closer to the actual start and stop of the request.

Otherwise LGTM

markurtz · 2025-03-12T15:19:21Z

Sorry this took so long to review. My testing showed that there was a big regression in the time between each request for all scenarios. But I believe its just due to the start and stop timing being moved closer to the actual start and stop of the request.

Otherwise LGTM

Thanks @sjmonson! I think that's likely. Also, provided we're not seeing a severe regression, I would say let's focus on optimizing the perf with #96

Prior to the `openai_server` -> `openai_http` refactor (#91), we were using the `extra_query` parameter [in the OpenAI client](https://github.com/openai/openai-python/blob/fad098ffad7982a5150306a3d17f51ffef574f2e/src/openai/resources/models.py#L50) to send custom query parameters to the OpenAI server in requests made by guidellm. This PR adds that parameter to the new `OpenAIHTTPBackend`, making it possible to add custom query parameters that are included in every request sent to the server.

markurtz self-assigned this Mar 6, 2025

markurtz marked this pull request as ready for review March 8, 2025 10:26

markurtz requested review from sjmonson and anmarques March 8, 2025 10:27

anmarques and others added 17 commits March 8, 2025 16:13

Adds aiohttp backend

ecf9bdb

Add mean and percentile info as computed_field properties such that t…

b544274

…hey become serializable

quality fixes

8a528db

quality fix

6b078a0

Quality fixes

3d9e99b

Ignore EOS

b75f670

Ignore EOS

19ed166

Refactor generate_benchmark_report function to set default parameter …

30f710a

…values

Add mean and percentile info as computed_field properties such that t…

36d6986

…hey become serializable

quality fixes

3264d06

Quality fixes

b68b045

Ignore EOS

6be24f4

Ignore EOS

db3733d

Rework for OpenAI backend to use native http requests with httpx and …

a822f60

…http/2 support. Additionally, add in support for multi modal requests (needs further enablement in the rest of the system for future TODO). Still needs testing and test fixes

Ensure core, request, and utils tests are all passing

58b20f4

Finalize implementation, fix bugs, and ensure unit tests are passing …

5c3145a

…/ in place

Rebase on latest main

9e4bc84

markurtz force-pushed the openai_backend_fixes branch from 24f753a to 9e4bc84 Compare March 8, 2025 16:44

Fix pre-commit checks not passing

e3770c1

This was referenced Mar 10, 2025

Clean up JSON output #76

Closed

Client-side prompt token count is inaccurate #75

Closed

problems with sglang #68

Open

guidance_report.json from default flow is very large #38

Closed

38 guidance reportjson from default flow is very large #60

Closed

sjmonson approved these changes Mar 11, 2025

View reviewed changes

markurtz changed the title ~~OpenAI HTTP Backend Implementation~~ Rework Backend to Native HTTP Requests and Enhance API Compatibility & Performance Mar 12, 2025

markurtz merged commit 3b346b5 into main Mar 12, 2025
9 checks passed

markurtz deleted the openai_backend_fixes branch March 12, 2025 15:42

jackcook mentioned this pull request May 2, 2025

Allow extra query params to be sent to the OpenAI server #146

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rework Backend to Native HTTP Requests and Enhance API Compatibility & Performance #91

Rework Backend to Native HTTP Requests and Enhance API Compatibility & Performance #91

Uh oh!

markurtz commented Mar 6, 2025 •

edited

Loading

Uh oh!

sjmonson left a comment

Uh oh!

markurtz commented Mar 12, 2025

Uh oh!

Uh oh!

Uh oh!

Rework Backend to Native HTTP Requests and Enhance API Compatibility & Performance #91

Rework Backend to Native HTTP Requests and Enhance API Compatibility & Performance #91

Uh oh!

Conversation

markurtz commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Testing

Related Issues

Uh oh!

sjmonson left a comment

Choose a reason for hiding this comment

Uh oh!

markurtz commented Mar 12, 2025

Uh oh!

Uh oh!

Uh oh!

markurtz commented Mar 6, 2025 •

edited

Loading