chore(http-client): cleanup types, improve coverage, remove orjson#121
chore(http-client): cleanup types, improve coverage, remove orjson#121
Conversation
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
Summary of ChangesHello @viraatc, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request delivers a significant refactor and cleanup of the HTTP client, primarily addressing type hinting inconsistencies and improving the overall robustness and performance. The changes streamline the client's internal mechanisms, particularly around connection handling and event loop management, leading to measurable performance gains. Additionally, comprehensive new tests have been introduced to ensure the correct behavior of core client functionalities like polling, receiving, draining, and shutting down. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Pull request overview
Refactors the HTTP client types/initialization to reduce runtime asserts, and expands integration coverage for non-blocking/buffering APIs (poll/recv/drain/shutdown).
Changes:
- Added new integration tests covering
poll(),recv(),drain(), and shutdown idempotency/behavior. - Refactored worker/client/config typing by making several fields non-optional and removing some runtime asserts.
- Simplified
drain()implementation and adjusted shutdown flow/idempotency checks.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/integration/endpoint_client/test_http_client.py | Adds new integration coverage for poll/recv/drain/shutdown and introduces shared client/query helpers. |
| src/inference_endpoint/endpoint_client/worker.py | Removes several runtime asserts and changes request firing to pass pooled connections explicitly. |
| src/inference_endpoint/endpoint_client/http_sample_issuer.py | Drops an assert that is no longer needed due to non-optional loop typing. |
| src/inference_endpoint/endpoint_client/http_client.py | Simplifies loop/thread setup, makes drain() more concise, and tweaks shutdown idempotency behavior. |
| src/inference_endpoint/endpoint_client/config.py | Makes adapter/accumulator/transport non-optional (defaulted in __post_init__). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Code Review
This pull request introduces several cleanups to type hinting, which simplifies the code by removing numerous assert is not None checks. It also significantly improves test coverage for the HTTP client's response retrieval and shutdown mechanisms. While most changes are positive, I've identified a couple of concerns regarding thread safety and shutdown logic that could introduce bugs. My review includes specific suggestions to address these potential issues.
I am having trouble creating individual review comments. Click here to see my feedback.
src/inference_endpoint/endpoint_client/http_client.py (163-169)
This issue method override was crucial for thread safety and should not have been removed. HTTPEndpointClient is a synchronous wrapper intended for use from other threads. The underlying transport sockets are not thread-safe, so calls must be scheduled on the event loop thread. This implementation correctly used loop.call_soon_threadsafe to achieve that. Removing it introduces a race condition. Please restore this method.
src/inference_endpoint/endpoint_client/http_client.py (126-143)
Moving self._shutdown = True to the end of the method creates a race condition. During the await self.worker_manager.shutdown() call, new requests can still be accepted by issue() because self._shutdown is False. These requests will likely fail ungracefully as workers are terminated, and they won't be counted as dropped. To ensure new requests are gracefully rejected during shutdown, self._shutdown = True should be set at the beginning of the method, immediately after the idempotency check.
if self._shutdown: # Already shutdown, no-op
return
self._shutdown = True
logger.info(f"[{self.client_id}] Shutting down...")
# Shutdown workers
await self.worker_manager.shutdown()
# Stop event loop if we own it (scheduled to run after this coroutine completes)
if self._owns_loop:
self.loop.call_soon(self.loop.stop)
if self._dropped_requests > 0:
logger.info(
f"[{self.client_id}] Dropped {self._dropped_requests} requests during shutdown"
)
logger.info(f"[{self.client_id}] Shutdown complete.")
ba1a2a5 to
ea40616
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
298afff to
5f150c5
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
5f150c5 to
0ce6261
Compare
0ce6261 to
9103120
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
01ae3f5 to
90bd51d
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
src/inference_endpoint/endpoint_client/http_client.py:174
HTTPEndpointClientno longer overridesissue()to schedule onto the client’s event-loop thread.AsyncHttpEndpointClient.issue()now callsself.pool.send(...)directly from the caller thread, but the underlying ZMQ sender can callloop.add_writer(...)(not thread-safe). This can cause races or runtime errors under backpressure; restore a thread-safeissue()wrapper (e.g.,loop.call_soon_threadsafe) for the sync client.
class HTTPEndpointClient(AsyncHttpEndpointClient):
"""
Sync HTTP client for LLM inference.
Inherits from AsyncHttpEndpointClient and provides sync interface.
Usage:
client = HTTPEndpointClient(config)
client.issue(query)
"""
def shutdown(self) -> None: # type: ignore[override]
"""Sync shutdown wrapper - blocks until base class async shutdown completes."""
if self._shutdown: # Already shutdown, no-op
return
asyncio.run_coroutine_threadsafe(super().shutdown(), self.loop).result()
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
nv-alicheng
left a comment
There was a problem hiding this comment.
Changes look like they do what what the PR is about but what is the reasoning for removing orjson? Is msgpack encode faster or just that it fits into the pipeline more since we use msgpack structs throughout the project?
ORjson doesnt support py3.14t (GILL-Free builds) as of yet, msgspec already does. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
aed4906 to
f6bf1af
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
src/inference_endpoint/metrics/recorder.py:400
- When JSON encoding fails, the recorded error payload uses
"error_type": "JSONEncodeError", but the exception being caught is nowmsgspec.EncodeError. This makes logs harder to interpret; consider updatingerror_typeto match the actual exception/library (e.g.,msgspec.EncodeError).
msgspec.json.encode(
{
"error_type": "JSONEncodeError",
"error_message": str(e),
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
59d565e to
c05b75a
Compare
0c3eecd to
c05b75a
Compare
c05b75a to
e325f21
Compare
|
continuing in #159 |
What does this PR do?
(1) before:
(2) before, with asserts disabled:
after: (within run-to-run variation of (2))
measurements made using standalone MR: #122
Type of change
Related issues
Testing
Checklist