Skip to content

Commit

Permalink
Relocate perf analyzer documentation (triton-inference-server#5380)
Browse files Browse the repository at this point in the history
* Relocate perf analyzer documentation

* Comments addressed
  • Loading branch information
matthewkotila authored Feb 22, 2023
1 parent b8b9e1f commit 61ecbf9
Show file tree
Hide file tree
Showing 12 changed files with 65 additions and 750 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,8 @@ configuration](docs/user_guide/model_configuration.md) for the model.
[Backend-Platform Support Matrix](https://github.com/triton-inference-server/backend/blob/main/docs/backend_platform_support_matrix.md)
to learn which backends are supported on your target platform.
- Learn how to [optimize performance](docs/user_guide/optimization.md) using the
[Performance Analyzer](docs/user_guide/perf_analyzer.md) and
[Performance Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
and
[Model Analyzer](https://github.com/triton-inference-server/model_analyzer)
- Learn how to [manage loading and unloading models](docs/user_guide/model_management.md) in
Triton
Expand Down
10 changes: 8 additions & 2 deletions deploy/gke-marketplace-app/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<!--
# Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2021-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
Expand Down Expand Up @@ -159,7 +159,13 @@ The client example push about ~650 QPS(Query per second) to Triton Server, and w

![Locust Client Chart](client.png)

Alternatively, user can opt to use [Perf Analyzer](https://github.com/triton-inference-server/server/blob/master/docs/user_guide/perf_analyzer.md) to profile and study the performance of Triton Inference Server. Here we also provide a [client script](https://github.com/triton-inference-server/server/tree/master/deploy/gke-marketplace-app/client-sample/perf_analyzer_grpc.sh) to use Perf Analyzer to send gRPC to Triton Server GKE deployment. Perf Analyzer client requires user to use NGC Triton Client Container.
Alternatively, user can opt to use
[Perf Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
to profile and study the performance of Triton Inference Server. Here we also
provide a
[client script](https://github.com/triton-inference-server/server/tree/master/deploy/gke-marketplace-app/client-sample/perf_analyzer_grpc.sh)
to use Perf Analyzer to send gRPC to Triton Server GKE deployment. Perf Analyzer
client requires user to use NGC Triton Client Container.

```
bash perf_analyzer_grpc.sh ${INGRESS_HOST}:${INGRESS_PORT}
Expand Down
6 changes: 4 additions & 2 deletions deploy/k8s-onprem/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<!--
# Copyright (c) 2018-2021, NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
Expand Down Expand Up @@ -284,7 +284,9 @@ Image 'images/mug.jpg':
## Testing Load Balancing and Autoscaling
After you have confirmed that your Triton cluster is operational and can perform inference,
you can test the load balancing and autoscaling features by sending a heavy load of requests.
One option for doing this is using the [perf_analyzer](../../docs/user_guide/perf_analyzer.md) application.
One option for doing this is using the
[perf_analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
application.

You can apply a progressively increasing load with a command like:
```
Expand Down
9 changes: 6 additions & 3 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<!--
# Copyright 2018-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
Expand Down Expand Up @@ -166,7 +166,7 @@ Understanding Inference perfomance is key to better resource utilization. Use Tr
- [Performance Tuning Guide](user_guide/performance_tuning.md)
- [Optimization](user_guide/optimization.md)
- [Model Analyzer](user_guide/model_analyzer.md)
- [Performance Analyzer](user_guide/perf_analyzer.md)
- [Performance Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
- [Inference Request Tracing](user_guide/trace.md)
### Jetson and JetPack
Triton can be deployed on edge devices. Explore [resources](user_guide/jetson.md) and [examples](examples/jetson/README.md).
Expand All @@ -178,7 +178,10 @@ The following resources are recommended to explore the full suite of Triton Infe

- **Configuring Deployment**: Triton comes with three tools which can be used to configure deployment setting, measure performance and recommend optimizations.
- [Model Analyzer](https://github.com/triton-inference-server/model_analyzer) Model Analyzer is CLI tool built to recommend deployment configurations for Triton Inference Server based on user's Quality of Service Requirements. It also generates detailed reports about model performance to summarize the benefits and trade offs of different configurations.
- [Perf Analyzer](user_guide/perf_analyzer.md): Perf Analyzer is a CLI application built to generate inference requests and measures the latency of those requests and throughput of the model being served .
- [Perf Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md):
Perf Analyzer is a CLI application built to generate inference requests and
measures the latency of those requests and throughput of the model being
served.
- [Model Navigator](https://github.com/triton-inference-server/model_navigator):
The Triton Model Navigator is a tool that provides the ability to automate the process of moving model from source to optimal format and configuration for deployment on Triton Inference Server. The tool supports export model from source to all possible formats and applies the Triton Inference Server backend optimizations.

Expand Down
11 changes: 8 additions & 3 deletions docs/examples/jetson/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<!--
# Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2021-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
Expand Down Expand Up @@ -52,12 +52,17 @@ Inference Server as a shared library.

## Part 2. Analyzing model performance with perf_analyzer

To analyze model performance on Jetson, [perf_analyzer](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/perf_analyzer.md) tool is used. The `perf_analyzer` is included in the release tar file or can be compiled from source.
To analyze model performance on Jetson,
[perf_analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
tool is used. The `perf_analyzer` is included in the release tar file or can be
compiled from source.

From this directory of the repository, execute the following to evaluate model performance:

```shell
./perf_analyzer -m peoplenet -b 2 --service-kind=triton_c_api --model-repo=$(pwd)/concurrency_and_dynamic_batching/trtis_model_repo_sample_1 --triton-server-directory=/opt/tritonserver --concurrency-range 1:6 -f perf_c_api.csv
```

In the example above we saved the results as a `.csv` file. To visualize these results, follow the steps described [here](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/perf_analyzer.md#visualizing-latency-vs-throughput).
In the example above we saved the results as a `.csv` file. To visualize these
results, follow the steps described
[here](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md#visualizing-latency-vs-throughput).
11 changes: 6 additions & 5 deletions docs/user_guide/faq.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<!--
# Copyright 2019-2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright 2019-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
Expand Down Expand Up @@ -98,10 +98,11 @@ ways: by [Prometheus metrics](metrics.md) and by the statistics
available through the [HTTP/REST, GRPC, and C
APIs](../customization_guide/inference_protocols.md).

A client application, [perf_analyzer](perf_analyzer.md), allows you to
measure the performance of an individual model using a synthetic
load. The perf_analyzer application is designed to show you the
tradeoff of latency vs. throughput.
A client application,
[perf_analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md),
allows you to measure the performance of an individual model using a synthetic
load. The perf_analyzer application is designed to show you the tradeoff of
latency vs. throughput.

## How can I fully utilize the GPU with Triton Inference Server?

Expand Down
8 changes: 5 additions & 3 deletions docs/user_guide/jetson.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<!--
# Copyright 2021-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright 2021-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
Expand Down Expand Up @@ -196,8 +196,10 @@ tritonserver --model-repository=/path/to/model_repo --backend-directory=/path/to
--backend-config=tensorflow,version=2
```

**Note**: [perf_analyzer](perf_analyzer.md) is supported on Jetson, while the [model_analyzer](model_analyzer.md)
is currently not available for Jetson. To execute `perf_analyzer` for C API, use
**Note**:
[perf_analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
is supported on Jetson, while the [model_analyzer](model_analyzer.md) is
currently not available for Jetson. To execute `perf_analyzer` for C API, use
the CLI flag `--service-kind=triton_c_api`:

```shell
Expand Down
18 changes: 9 additions & 9 deletions docs/user_guide/model_analyzer.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<!--
# Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2020-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
Expand Down Expand Up @@ -28,14 +28,14 @@

# Model Analyzer

The Triton Model Analyzer is a tool that uses [Performance
Analyzer](perf_analyzer.md) to send requests to your model while
measuring GPU memory and compute utilization. The Model Analyzer is
specifically useful for characterizing the GPU memory requirements for
your model under different batching and model instance
configurations. Once you have this GPU memory usage information you
can more intelligently decide on how to combine multiple models on the
same GPU while remaining within the memory capacity of the GPU.
The Triton Model Analyzer is a tool that uses
[Performance Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
to send requests to your model while measuring GPU memory and compute
utilization. The Model Analyzer is specifically useful for characterizing the
GPU memory requirements for your model under different batching and model
instance configurations. Once you have this GPU memory usage information you can
more intelligently decide on how to combine multiple models on the same GPU
while remaining within the memory capacity of the GPU.

For more information see the [Model Analyzer
repository](https://github.com/triton-inference-server/model_analyzer)
Expand Down
7 changes: 4 additions & 3 deletions docs/user_guide/model_configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -813,9 +813,10 @@ dynamic batcher configurations.
dynamic_batching { }
```

* Use the [Performance Analyzer](perf_analyzer.md) to determine the
latency and throughput provided by the default dynamic batcher
configuration.
* Use the
[Performance Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
to determine the latency and throughput provided by the default dynamic
batcher configuration.

* If the default configuration results in latency values that are
within your latency budget, try one or both of the following to
Expand Down
7 changes: 4 additions & 3 deletions docs/user_guide/optimization.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<!--
# Copyright (c) 2019-2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2019-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
Expand Down Expand Up @@ -43,8 +43,9 @@ single GPU.

Unless you already have a client application suitable for measuring
the performance of your model on Triton, you should familiarize
yourself with [Performance Analyzer](perf_analyzer.md). The
Performance Analyzer is an essential tool for optimizing your model's
yourself with
[Performance Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md).
The Performance Analyzer is an essential tool for optimizing your model's
performance.

As a running example demonstrating the optimization features and
Expand Down
Loading

0 comments on commit 61ecbf9

Please sign in to comment.