Skip to content

Commit

Permalink
Add documentation on request cancellation (#6403) (#6407)
Browse files Browse the repository at this point in the history
* Add documentation on request cancellation

* Include python backend

* Update docs/user_guide/request_cancellation.md

* Update docs/user_guide/request_cancellation.md

* Update docs/README.md

* Update docs/user_guide/request_cancellation.md

* Remove inflight term from the main documentation

* Address review comments

* Fix

* Update docs/user_guide/request_cancellation.md

* Fix

---------

Co-authored-by: Iman Tabrizian <iman.tabrizian@gmail.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
Co-authored-by: Jacky <18255193+kthui@users.noreply.github.com>
  • Loading branch information
5 people authored and mc-nv committed Oct 26, 2023
1 parent 21ee57a commit 61f4a67
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions docs/user_guide/request_cancellation.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@

# Request Cancellation

Starting from r23.10, Triton supports handling request cancellation received
Starting from 23.10, Triton supports handling request cancellation received
from the gRPC client or a C API user. Long running inference requests such
as for auto generative large language models may run for an indeterminate
amount of time or indeterminate number of steps. Additionally clients may
Expand All @@ -39,7 +39,7 @@ resources.

## Issuing Request Cancellation

### In-Process C API
### Triton C API

[In-Process Triton Server C API](../customization_guide/inference_protocols.md#in-process-triton-server-api) has been enhanced with `TRITONSERVER_InferenceRequestCancel`
and `TRITONSERVER_InferenceRequestIsCancelled` to issue cancellation and query
Expand Down Expand Up @@ -77,9 +77,9 @@ detection and handling within Triton core is work in progress.

## Handling in Backend

Upon receiving request cancellation, Triton does its best to terminate request
Upon receiving request cancellation, triton does its best to terminate request
at various points. However, once a request has been given to the backend
for execution, it is up to the individual backends to detect and handle
for execution, it is upto the individual backends to detect and handle
request termination.
Currently, the following backends support early termination:
- [vLLM backend](https://github.com/triton-inference-server/vllm_backend)
Expand Down

0 comments on commit 61f4a67

Please sign in to comment.