Add documentation on request cancellation (#6403) (#6407)

* Add documentation on request cancellation * Include python backend * Update docs/user_guide/request_cancellation.md * Update docs/user_guide/request_cancellation.md * Update docs/README.md * Update docs/user_guide/request_cancellation.md * Remove inflight term from the main documentation * Address review comments * Fix * Update docs/user_guide/request_cancellation.md * Fix --------- Co-authored-by: Iman Tabrizian <iman.tabrizian@gmail.com> Co-authored-by: Neelay Shah <neelays@nvidia.com> Co-authored-by: Ryan McCormick <rmccormick@nvidia.com> Co-authored-by: Jacky <18255193+kthui@users.noreply.github.com>
triton-inference-server · Oct 26, 2023 · 61f4a67 · 61f4a67
1 parent 21ee57a
commit 61f4a67
Showing 1 changed file with 4 additions and 4 deletions.
diff --git a/docs/user_guide/request_cancellation.md b/docs/user_guide/request_cancellation.md
@@ -28,7 +28,7 @@
 
 # Request Cancellation
 
-Starting from r23.10, Triton supports handling request cancellation received
+Starting from 23.10, Triton supports handling request cancellation received
 from the gRPC client or a C API user. Long running inference requests such
 as for auto generative large language models may run for an indeterminate
 amount of time or indeterminate number of steps. Additionally clients may
@@ -39,7 +39,7 @@ resources.
 
 ## Issuing Request Cancellation
 
-### In-Process C API
+### Triton C API
 
 [In-Process Triton Server C API](../customization_guide/inference_protocols.md#in-process-triton-server-api) has been enhanced with `TRITONSERVER_InferenceRequestCancel`
 and `TRITONSERVER_InferenceRequestIsCancelled` to issue cancellation and query
@@ -77,9 +77,9 @@ detection and handling within Triton core is work in progress.
 
 ## Handling in Backend
 
-Upon receiving request cancellation, Triton does its best to terminate request
+Upon receiving request cancellation, triton does its best to terminate request
 at various points. However, once a request has been given to the backend
-for execution, it is up to the individual backends to detect and handle
+for execution, it is upto the individual backends to detect and handle
 request termination.
 Currently, the following backends support early termination:
 - [vLLM backend](https://github.com/triton-inference-server/vllm_backend)