-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for http generate request cancellation and segfault fix #6591
Conversation
…b.com/triton-inference-server/server into nnshah1-generate-request-cancellation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please make sure that L0_infer_valgrind tests pass after this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks generally good to me. Left some comments for follow-ups, doesn't need to block cherry pick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I will introduce RequestReleasePayload class in gRPC frontend too which would fix some lifetime issues that we are still observing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, not too sure if sagemaker server has any unique behavior, so will want to make sure L0_sagemaker passes.
Tested and expected tests passing. |
Hooking the request free callback from evhtp to support request cancellation.
From inspection, all request free operations have to go through:
https://github.com/triton-inference-server/third_party/blob/8af7da2ae3f4938a8b497728fcca6460797c402f/libevhtp/libevhtp/evhtp.c#L1233
And connection free frees the request first:
https://github.com/triton-inference-server/third_party/blob/8af7da2ae3f4938a8b497728fcca6460797c402f/libevhtp/libevhtp/evhtp.c#L5151
So when a client disconnects we can trap the request free callback and cancel the request.
There are three objects with lifetimes to maintain:
evhtp_req. Request object for libevhtp. This is created by libevhtp and freed by libevhtp when a connection is freed. We add the request_fini hook to get notified when this object is freed and use it as indication that the client has exited and cancelled the request. The req object can not be used after the request free hook. We remove the hook if the response is complete.
InferRequestClass. InferRequestClass is created by the httpserver to wrap the evhtp_req and the TRITONSERVER_InferenceRequest objects. It is used in the response callbacks to send http responses. The lifetime of the object is between the request creation and the final response for a request.
TRITONSERVER_InferenceRequest. InferenceRequest is created by the httpserver and used to send inputs to the core. It is also used to cancel requests. Its lifetime is between request creation and the final response AND the request release callback (sent from the core).
We use a shared_ptr to manage the lifetime of the InferenceRequest. The shared_ptr is held by the request_release_payload (freed in the request_release_callback) and by the InferRequestClass (freed in the final response callbacks). This enables us to use the request object to cancel requests after it has been released by the core but before the final response is received.
Some minor refactoring to store the inference request in the InferRequestClass.
Moved the Callbacks to the InferRequestClass to make Generate and InferRequestClass similar and to avoid having to make protected members public.