Skip to content

Rework Backend to Native HTTP Requests and Enhance API Compatibility & Performance #91

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Mar 12, 2025

Conversation

markurtz
Copy link
Member

@markurtz markurtz commented Mar 6, 2025

Summary

This PR restructures the backend of guidellm to utilize native HTTP requests with httpx and HTTP/2 standards, significantly improving performance and scalability. It aligns the backend interface closely with the OpenAI API specifications, enabling smoother integration paths for future expansions, including multi-backend support.

Details

  • Replaced legacy backend logic with native HTTP request handling using httpx, enabling HTTP/2 support for optimized performance.
  • Established a clear backend interface adhering strictly to OpenAI's API standards, simplifying future compatibility and extension.
  • Cleaned up backend file structure for clarity and maintainability:
    • Removed legacy files: base.py, load_generator.py.
    • Added new modular backend components: backend.py, scheduler/backend_worker.py, and structured response handling in response.py.
  • Optimized request-response flow, reducing overhead and latency.
  • Improved configuration management (config.py) to better support environment-based settings.

Testing

  • Comprehensive new tests added for backend modules and response handling.
  • All existing unit tests updated and verified to pass without regressions.
  • Manual testing performed for key API flows, verifying accuracy, stability, and significant performance improvements.

Related Issues

@markurtz markurtz self-assigned this Mar 6, 2025
@markurtz markurtz marked this pull request as ready for review March 8, 2025 10:26
@markurtz markurtz requested review from sjmonson and anmarques March 8, 2025 10:27
@markurtz markurtz force-pushed the openai_backend_fixes branch from 24f753a to 9e4bc84 Compare March 8, 2025 16:44
Copy link
Collaborator

@sjmonson sjmonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this took so long to review. My testing showed that there was a big regression in the time between each request for all scenarios. But I believe its just due to the start and stop timing being moved closer to the actual start and stop of the request.

2025-03-11T15-10-34

Otherwise LGTM

@markurtz
Copy link
Member Author

Sorry this took so long to review. My testing showed that there was a big regression in the time between each request for all scenarios. But I believe its just due to the start and stop timing being moved closer to the actual start and stop of the request.

2025-03-11T15-10-34

Otherwise LGTM

Thanks @sjmonson! I think that's likely. Also, provided we're not seeing a severe regression, I would say let's focus on optimizing the perf with #96

@markurtz markurtz changed the title OpenAI HTTP Backend Implementation Rework Backend to Native HTTP Requests and Enhance API Compatibility & Performance Mar 12, 2025
@markurtz markurtz merged commit 3b346b5 into main Mar 12, 2025
9 checks passed
@markurtz markurtz deleted the openai_backend_fixes branch March 12, 2025 15:42
markurtz pushed a commit that referenced this pull request May 7, 2025
Prior to the `openai_server` -> `openai_http` refactor (#91), we were
using the `extra_query` parameter [in the OpenAI
client](https://github.com/openai/openai-python/blob/fad098ffad7982a5150306a3d17f51ffef574f2e/src/openai/resources/models.py#L50)
to send custom query parameters to the OpenAI server in requests made by
guidellm. This PR adds that parameter to the new `OpenAIHTTPBackend`,
making it possible to add custom query parameters that are included in
every request sent to the server.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clean up JSON output Client-side prompt token count is inaccurate guidance_report.json from default flow is very large
3 participants