Skip to content

Commit

Permalink
fix metrics readme (#72)
Browse files Browse the repository at this point in the history
Fix README dead links 
Test added here triton-inference-server/server#4926
  • Loading branch information
jbkyang-nvi authored Oct 3, 2022
1 parent 4ab5353 commit ebb4aa6
Show file tree
Hide file tree
Showing 4 changed files with 23 additions and 24 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ backends](#backends).
Yes. See [Backend Shared Library](#backend-shared-library) for general
information about how the shared library implementing a backend is
managed by Triton, and [Triton with Unsupported and Custom
Backends](https://github.com/triton-inference-server/server/blob/main/docs/compose.md#triton-with-unsupported-and-custom-backends)
Backends](https://github.com/triton-inference-server/server/blob/main/docs/customization_guide/compose.md#triton-with-unsupported-and-custom-backends)
for documentation on how to add your backend to the released Triton
Docker image. For a standard install the globally available backends
are in /opt/tritonserver/backends.
Expand Down Expand Up @@ -190,7 +190,7 @@ to override the default.
Typically you will install your backend into the global backend
directory. For example, if using Triton Docker images you can follow
the instructions in [Triton with Unsupported and Custom
Backends](https://github.com/triton-inference-server/server/blob/main/docs/compose.md#triton-with-unsupported-and-custom-backends). Continuing
Backends](https://github.com/triton-inference-server/server/blob/main/docs/customization_guide/compose.md#triton-with-unsupported-and-custom-backends). Continuing
the example of a backend names "mybackend", you would install into the
Triton image as:

Expand Down Expand Up @@ -508,7 +508,7 @@ what can be acheived from decoupled API.
Study documentation of these TRTIONBACKEND_* functions in
[tritonbackend.h](https://github.com/triton-inference-server/core/blob/main/include/triton/core/tritonbackend.h)
for more details on these APIs. Read
[Decoupled Backends and Models](https://github.com/triton-inference-server/server/blob/main/docs/decoupled_models.md)
[Decoupled Backends and Models](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/decoupled_models.md)
for more details on how to host a decoupled model.

## Build the Backend Utilities
Expand Down
2 changes: 1 addition & 1 deletion docs/backend_platform_support_matrix.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ Following backends are currently supported on Jetson Jetpack:
| Python[^1] | :x: GPU <br/> :heavy_check_mark: CPU |


Look at the [Triton Inference Server Support for Jetson and JetPack](https://github.com/triton-inference-server/server/blob/main/docs/jetson.md).
Look at the [Triton Inference Server Support for Jetson and JetPack](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/jetson.md).


## AWS Inferentia
Expand Down
26 changes: 13 additions & 13 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,10 +47,10 @@ multiple responses per request.
[*stateful*](https://github.com/triton-inference-server/stateful_backend)
backend shows an example of how a backend can manage model state
tensors on the server-side for the [sequence
batcher](https://github.com/triton-inference-server/server/blob/main/docs/model_configuration.md#sequence-batcher)
batcher](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#sequence-batcher)
to avoid transferring state tensors between client and server. Triton
also implements [Implicit State
Management](https://github.com/triton-inference-server/server/blob/main/docs/architecture.md#implicit-state-management)
Management](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/architecture.md#implicit-state-management)
which allows backends to behave in a stateless manner and leave the
state management to Triton.

Expand All @@ -74,9 +74,9 @@ backend and the use of the [Triton Backend
API](../README.md#triton-backend-api) and the backend
utilities. Before reading the source code, make sure you understand
the concepts associated with Triton backend abstractions
[TRITONBACKEND_Backend](../README.md#tritonbackend-backend),
[TRITONBACKEND_Model](../README.md#tritonbackend-model), and
[TRITONBACKEND_ModelInstance](../README.md#tritonbackend-modelinstance).
[TRITONBACKEND_Backend](../README.md#tritonbackend_backend),
[TRITONBACKEND_Model](../README.md#tritonbackend_model), and
[TRITONBACKEND_ModelInstance](../README.md#tritonbackend_modelinstance).

The *minimal* backend does not do any interesting operation, it simply
copies a single input tensor to a single output tensor, but it does
Expand Down Expand Up @@ -160,10 +160,10 @@ I1215 23:46:00.250284 68 server.cc:589]

The models are identical except that the *batching* model enabled the
[dynamic
batcher](https://github.com/triton-inference-server/server/blob/main/docs/model_configuration.md#dynamic-batcher)
batcher](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#dynamic-batcher)
and supports batch sizes up to 8. Note that the *batching* model sets
the [batch
delay](https://github.com/triton-inference-server/server/blob/main/docs/model_configuration.md#delayed-batching)
delay](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#delayed-batching)
to 5 seconds so that the example client described below can
demonstrate how the *minimal* backend receives a batch of requests.

Expand Down Expand Up @@ -231,9 +231,9 @@ backend and the use of the [Triton Backend
API](../README.md#triton-backend-api) and the backend
utilities. Before reading the source code, make sure you understand
the concepts associated with Triton backend abstractions
[TRITONBACKEND_Backend](../README.md#tritonbackend-backend),
[TRITONBACKEND_Model](../README.md#tritonbackend-model), and
[TRITONBACKEND_ModelInstance](../README.md#tritonbackend-modelinstance).
[TRITONBACKEND_Backend](../README.md#tritonbackend_backend),
[TRITONBACKEND_Model](../README.md#tritonbackend_model), and
[TRITONBACKEND_ModelInstance](../README.md#tritonbackend_modelinstance).

The *recommended* backend improves the [*minimal*
backend](#minimal-triton-backend) to include the following features
Expand All @@ -248,7 +248,7 @@ which should be present in any robust backend implementation:
* Uses the Triton backend metric APIs to record statistics about
requests executing in the backend. These metrics can then we queried
using the Triton
[metrics](https://github.com/triton-inference-server/server/blob/main/docs/metrics.md)
[metrics](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/metrics.md)
and
[statistics](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_statistics.md)
APIs.
Expand Down Expand Up @@ -361,7 +361,7 @@ $ curl localhost:8002/metrics

The output will be metric values in Prometheus data format. The
[metrics
documentation](https://github.com/triton-inference-server/server/blob/main/docs/metrics.md)
documentation](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/metrics.md)
gives a description of these metric values.

```
Expand Down Expand Up @@ -400,7 +400,7 @@ enhance the capabilities of your backend.
#### Automatically Model Configuration Generation

[Automatic model configuration
generation](https://github.com/triton-inference-server/server/blob/main/docs/model_configuration.md#auto-generated-model-configuration)
generation](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#auto-generated-model-configuration)
is enabled by the backend implementing the appropriate logic (for
example, in a function called AutoCompleteConfig) during
TRITONBACKEND_ModelInitialize. For the *recommended* backend you would
Expand Down
13 changes: 6 additions & 7 deletions examples/backends/bls/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ The [*BLS*](../bls) backend demonstrates using in-process C-API to
execute inferences within the backend. This backend serves as an example to
backend developers for implementing their own custom pipeline in C++.
For Python use cases, please refer to
[Business Logic Scripting](https://github.com/triton-inference-server/python_backend#business-logic-scripting)
[Business Logic Scripting](https://github.com/triton-inference-server/python_backend/blob/main/README.md#business-logic-scripting)
section in Python backend.

The source code for the *bls* backend is contained in
Expand All @@ -42,8 +42,7 @@ The source code for the *bls* backend is contained in
implementation. The content of this file is not BLS specific. It only includes
the required Triton backend functions that is standard for any backend
implementation. The BLS logic is set off in the
[`TRITONBACKEND_ModelInstanceExecute`](./src/backend.cc#L316).
function.
`TRITONBACKEND_ModelInstanceExecute` with lines `bls_executor.Execute(requests[r], &responses[r]);`.

* [bls.h](./src/bls.h) is where the BLS (class `BLSExecutor`) of
this example is located. You can refer to this file to see how to interact with
Expand All @@ -55,7 +54,7 @@ are not BLS dependent are located.
The source code contains extensive documentation describing the operation of
the backend and the use of the
[Triton Backend API](../../../README.md#triton-backend-api) and the
[Triton Server API](https://github.com/triton-inference-server/server/blob/main/docs/inference_protocols.md#in-process-triton-server-api).
[Triton Server API](https://github.com/triton-inference-server/server/blob/main/docs/customization_guide/inference_protocols.md#in-process-triton-server-api).
Before reading the source code, make sure you understand
the concepts associated with Triton backend abstractions
[TRITONBACKEND_Backend](../../../README.md#tritonbackend_backend),
Expand All @@ -70,9 +69,9 @@ construct the final inference response object using these tensors.
There are some self-imposed limitations that were made for the simplicity of
this example:
1. This backend does not support batching.
1. This backend does not support decoupled models.
1. This backend does not support GPU tensors.
1. The model configuraion should be strictly set as the comments described in
2. This backend does not support decoupled models.
3. This backend does not support GPU tensors.
4. The model configuraion should be strictly set as the comments described in
[backend.cc](./src/backend.cc).

You can implement your custom backend that is not limited to the limitations
Expand Down

0 comments on commit ebb4aa6

Please sign in to comment.