Relocate perf analyzer documentation (triton-inference-server#5380)

* Relocate perf analyzer documentation * Comments addressed
rampenke · Feb 22, 2023 · 61ecbf9 · 61ecbf9
1 parent b8b9e1f
commit 61ecbf9
Show file tree

Hide file tree

Showing 12 changed files with 65 additions and 750 deletions.
diff --git a/README.md b/README.md
@@ -170,7 +170,8 @@ configuration](docs/user_guide/model_configuration.md) for the model.
   [Backend-Platform Support Matrix](https://github.com/triton-inference-server/backend/blob/main/docs/backend_platform_support_matrix.md)
   to learn which backends are supported on your target platform.
 - Learn how to [optimize performance](docs/user_guide/optimization.md) using the 
-  [Performance Analyzer](docs/user_guide/perf_analyzer.md) and 
+  [Performance Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
+  and
   [Model Analyzer](https://github.com/triton-inference-server/model_analyzer)
 - Learn how to [manage loading and unloading models](docs/user_guide/model_management.md) in 
   Triton

diff --git a/deploy/gke-marketplace-app/README.md b/deploy/gke-marketplace-app/README.md
@@ -1,5 +1,5 @@
 <!--
-# Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# Copyright (c) 2021-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 #
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions
@@ -159,7 +159,13 @@ The client example push about ~650 QPS(Query per second) to Triton Server, and w
 
 ![Locust Client Chart](client.png)
 
-Alternatively, user can opt to use [Perf Analyzer](https://github.com/triton-inference-server/server/blob/master/docs/user_guide/perf_analyzer.md) to profile and study the performance of Triton Inference Server. Here we also provide a [client script](https://github.com/triton-inference-server/server/tree/master/deploy/gke-marketplace-app/client-sample/perf_analyzer_grpc.sh) to use Perf Analyzer to send gRPC to Triton Server GKE deployment. Perf Analyzer client requires user to use NGC Triton Client Container.
+Alternatively, user can opt to use
+[Perf Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
+to profile and study the performance of Triton Inference Server. Here we also
+provide a
+[client script](https://github.com/triton-inference-server/server/tree/master/deploy/gke-marketplace-app/client-sample/perf_analyzer_grpc.sh)
+to use Perf Analyzer to send gRPC to Triton Server GKE deployment. Perf Analyzer
+client requires user to use NGC Triton Client Container.
 
 ```
 bash perf_analyzer_grpc.sh ${INGRESS_HOST}:${INGRESS_PORT}

diff --git a/deploy/k8s-onprem/README.md b/deploy/k8s-onprem/README.md
@@ -1,5 +1,5 @@
 <!--
-# Copyright (c) 2018-2021, NVIDIA CORPORATION. All rights reserved.
+# Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 #
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions
@@ -284,7 +284,9 @@ Image 'images/mug.jpg':
 ## Testing Load Balancing and Autoscaling
 After you have confirmed that your Triton cluster is operational and can perform inference,
 you can test the load balancing and autoscaling features by sending a heavy load of requests.
-One option for doing this is using the [perf_analyzer](../../docs/user_guide/perf_analyzer.md) application.
+One option for doing this is using the
+[perf_analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
+application.
 
 You can apply a progressively increasing load with a command like:
 ```

diff --git a/docs/README.md b/docs/README.md
@@ -1,5 +1,5 @@
 <!--
-# Copyright 2018-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# Copyright 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 #
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions
@@ -166,7 +166,7 @@ Understanding Inference perfomance is key to better resource utilization. Use Tr
 - [Performance Tuning Guide](user_guide/performance_tuning.md)
 - [Optimization](user_guide/optimization.md)
 - [Model Analyzer](user_guide/model_analyzer.md)
-- [Performance Analyzer](user_guide/perf_analyzer.md)
+- [Performance Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
 - [Inference Request Tracing](user_guide/trace.md)
 ### Jetson and JetPack
 Triton can be deployed on edge devices. Explore [resources](user_guide/jetson.md) and [examples](examples/jetson/README.md).
@@ -178,7 +178,10 @@ The following resources are recommended to explore the full suite of Triton Infe
 
 - **Configuring Deployment**: Triton comes with three tools which can be used to configure deployment setting, measure performance and recommend optimizations.
   - [Model Analyzer](https://github.com/triton-inference-server/model_analyzer) Model Analyzer is CLI tool built to recommend deployment configurations for Triton Inference Server based on user's Quality of Service Requirements. It also generates detailed reports about model performance to summarize the benefits and trade offs of different configurations.
-  - [Perf Analyzer](user_guide/perf_analyzer.md): Perf Analyzer is a CLI application built to generate inference requests and measures the latency of those requests and throughput of the model being served .
+  - [Perf Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md):
+  Perf Analyzer is a CLI application built to generate inference requests and
+  measures the latency of those requests and throughput of the model being
+  served.
   - [Model Navigator](https://github.com/triton-inference-server/model_navigator):
   The Triton Model Navigator is a tool that provides the ability to automate the process of moving model from source to optimal format and configuration for deployment on Triton Inference Server. The tool supports export model from source to all possible formats and applies the Triton Inference Server backend optimizations.
 

diff --git a/docs/examples/jetson/README.md b/docs/examples/jetson/README.md
@@ -1,5 +1,5 @@
 <!--
-# Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# Copyright (c) 2021-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 #
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions
@@ -52,12 +52,17 @@ Inference Server as a shared library.
 
 ## Part 2. Analyzing model performance with perf_analyzer
 
-To analyze model performance on Jetson, [perf_analyzer](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/perf_analyzer.md) tool is used. The `perf_analyzer` is included in the release tar file or can be compiled from source.
+To analyze model performance on Jetson,
+[perf_analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
+tool is used. The `perf_analyzer` is included in the release tar file or can be
+compiled from source.
 
 From this directory of the repository, execute the following to evaluate model performance:
 
 ```shell
 ./perf_analyzer -m peoplenet -b 2 --service-kind=triton_c_api --model-repo=$(pwd)/concurrency_and_dynamic_batching/trtis_model_repo_sample_1 --triton-server-directory=/opt/tritonserver --concurrency-range 1:6 -f perf_c_api.csv
 ```
 
-In the example above we saved the results as a `.csv` file. To visualize these results, follow the steps described [here](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/perf_analyzer.md#visualizing-latency-vs-throughput).
+In the example above we saved the results as a `.csv` file. To visualize these
+results, follow the steps described
+[here](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md#visualizing-latency-vs-throughput).
diff --git a/docs/user_guide/faq.md b/docs/user_guide/faq.md
@@ -1,5 +1,5 @@
 <!--
-# Copyright 2019-2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# Copyright 2019-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 #
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions
@@ -98,10 +98,11 @@ ways: by [Prometheus metrics](metrics.md) and by the statistics
 available through the [HTTP/REST, GRPC, and C
 APIs](../customization_guide/inference_protocols.md).
 
-A client application, [perf_analyzer](perf_analyzer.md), allows you to
-measure the performance of an individual model using a synthetic
-load. The perf_analyzer application is designed to show you the
-tradeoff of latency vs. throughput.
+A client application,
+[perf_analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md),
+allows you to measure the performance of an individual model using a synthetic
+load. The perf_analyzer application is designed to show you the tradeoff of
+latency vs. throughput.
 
 ## How can I fully utilize the GPU with Triton Inference Server?
 

diff --git a/docs/user_guide/jetson.md b/docs/user_guide/jetson.md
@@ -1,5 +1,5 @@
 <!--
-# Copyright 2021-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# Copyright 2021-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 #
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions
@@ -196,8 +196,10 @@ tritonserver --model-repository=/path/to/model_repo --backend-directory=/path/to
              --backend-config=tensorflow,version=2
 ```
 
-**Note**: [perf_analyzer](perf_analyzer.md) is supported on Jetson, while the [model_analyzer](model_analyzer.md)
-is currently not available for Jetson. To execute `perf_analyzer` for C API, use
+**Note**:
+[perf_analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
+is supported on Jetson, while the [model_analyzer](model_analyzer.md) is
+currently not available for Jetson. To execute `perf_analyzer` for C API, use
 the CLI flag `--service-kind=triton_c_api`:
 
 ```shell

diff --git a/docs/user_guide/model_analyzer.md b/docs/user_guide/model_analyzer.md
@@ -1,5 +1,5 @@
 <!--
-# Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
+# Copyright (c) 2020-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 #
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions
@@ -28,14 +28,14 @@
 
 # Model Analyzer
 
-The Triton Model Analyzer is a tool that uses [Performance
-Analyzer](perf_analyzer.md) to send requests to your model while
-measuring GPU memory and compute utilization. The Model Analyzer is
-specifically useful for characterizing the GPU memory requirements for
-your model under different batching and model instance
-configurations. Once you have this GPU memory usage information you
-can more intelligently decide on how to combine multiple models on the
-same GPU while remaining within the memory capacity of the GPU.
+The Triton Model Analyzer is a tool that uses
+[Performance Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
+to send requests to your model while measuring GPU memory and compute
+utilization. The Model Analyzer is specifically useful for characterizing the
+GPU memory requirements for your model under different batching and model
+instance configurations. Once you have this GPU memory usage information you can
+more intelligently decide on how to combine multiple models on the same GPU
+while remaining within the memory capacity of the GPU.
 
 For more information see the [Model Analyzer
 repository](https://github.com/triton-inference-server/model_analyzer)

diff --git a/docs/user_guide/model_configuration.md b/docs/user_guide/model_configuration.md
@@ -813,9 +813,10 @@ dynamic batcher configurations.
   dynamic_batching { }
 ```
 
-* Use the [Performance Analyzer](perf_analyzer.md) to determine the
-  latency and throughput provided by the default dynamic batcher
-  configuration.
+* Use the
+  [Performance Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
+  to determine the latency and throughput provided by the default dynamic
+  batcher configuration.
 
 * If the default configuration results in latency values that are
   within your latency budget, try one or both of the following to

diff --git a/docs/user_guide/optimization.md b/docs/user_guide/optimization.md
@@ -1,5 +1,5 @@
 <!--
-# Copyright (c) 2019-2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# Copyright (c) 2019-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 #
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions
@@ -43,8 +43,9 @@ single GPU.
 
 Unless you already have a client application suitable for measuring
 the performance of your model on Triton, you should familiarize
-yourself with [Performance Analyzer](perf_analyzer.md). The
-Performance Analyzer is an essential tool for optimizing your model's
+yourself with
+[Performance Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md).
+The Performance Analyzer is an essential tool for optimizing your model's
 performance.
 
 As a running example demonstrating the optimization features and