From a2240c56282e7928d430d7260c860875f0e698a1 Mon Sep 17 00:00:00 2001 From: Katherine Yang <80359429+jbkyang-nvi@users.noreply.github.com> Date: Wed, 5 Oct 2022 14:52:59 -0700 Subject: [PATCH] fix broken links in server (#4926) fix broken links in documentation and add tests for backend --- CONTRIBUTING.md | 4 +- README.md | 8 +-- deploy/aws/README.md | 4 +- deploy/fleetcommand/README.md | 4 +- deploy/gcp/README.md | 4 +- deploy/gke-marketplace-app/README.md | 13 ++--- deploy/k8s-onprem/README.md | 10 ++-- deploy/mlflow-triton-plugin/README.md | 6 +-- docs/README.md | 4 +- docs/examples/jetson/README.md | 4 +- .../README.md | 2 +- docs/getting_started/quickstart.md | 2 +- docs/user_guide/architecture.md | 5 +- docs/user_guide/faq.md | 8 +-- docs/user_guide/model_configuration.md | 2 +- docs/user_guide/optimization.md | 2 +- qa/L0_doc_links/mkdocs.yml | 20 ++++++++ qa/L0_doc_links/test.sh | 51 +++++++++++++++++++ 18 files changed, 113 insertions(+), 40 deletions(-) create mode 100644 qa/L0_doc_links/mkdocs.yml create mode 100644 qa/L0_doc_links/test.sh diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index e0f5e91c74..1dd25c8192 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -47,7 +47,7 @@ proposed change so that the Triton team can provide feedback. will provide guidance about how and where your enhancement should be implemented. -- [Testing](docs/test.md) is a critical part of any Triton +- [Testing](docs/customization_guide/test.md) is a critical part of any Triton enhancement. You should plan on spending significant time on creating tests for your change. The Triton team will help you to design your testing so that it is compatible with existing testing @@ -84,7 +84,7 @@ proposed change so that the Triton team can provide feedback. - Make sure all `L0_*` tests pass: - In the `qa/` directory, there are basic sanity tests scripted in - directories named `L0_...`. See the [Test](docs/test.md) + directories named `L0_...`. See the [Test](docs/customization_guide/test.md) documentation for instructions on running these tests. - Triton Inference Server's default build assumes recent versions of diff --git a/README.md b/README.md index def3223b79..a910806afd 100644 --- a/README.md +++ b/README.md @@ -125,8 +125,8 @@ contains additional documentation, presentations, and examples. The recommended way to build and use Triton Inference Server is with Docker images. -- [Install Triton Inference Server with Docker containers](docs/customization_guide/build.md#building-triton-with-docker) (*Recommended*) -- [Install Triton Inference Server without Docker containers](docs/customization_guide/build.md#building-triton-without-docker) +- [Install Triton Inference Server with Docker containers](docs/customization_guide/build.md#building-with-docker) (*Recommended*) +- [Install Triton Inference Server without Docker containers](docs/customization_guide/build.md#building-without-docker) - [Build a custom Triton Inference Server Docker container](docs/customization_guide/compose.md) - [Build Triton Inference Server from source](docs/customization_guide/build.md#building-on-unsupported-platforms) - [Build Triton Inference Server for Windows 10](docs/customization_guide/build.md#building-for-windows-10) @@ -213,8 +213,8 @@ designed for modularity and flexibility ### Additional Documentation - [FAQ](docs/user_guide/faq.md) -- [User Guide](docs#user-guide) -- [Developer Guide](docs#developer-guide) +- [User Guide](docs/README.md#user-guide) +- [Customization Guide](docs/README.md#customization-guide) - [Release Notes](https://docs.nvidia.com/deeplearning/triton-inference-server/release-notes/index.html) - [GPU, Driver, and CUDA Support Matrix](https://docs.nvidia.com/deeplearning/dgx/support-matrix/index.html) diff --git a/deploy/aws/README.md b/deploy/aws/README.md index aecb7107fd..600f8c953f 100644 --- a/deploy/aws/README.md +++ b/deploy/aws/README.md @@ -98,7 +98,7 @@ in an AWS S3 Storage bucket. $ aws s3 mb s3://triton-inference-server-repository ``` -Following the [QuickStart](../../docs/quickstart.md) download the +Following the [QuickStart](../../docs/getting_started/quickstart.md) download the example model repository to your system and copy it into the AWS S3 bucket. @@ -218,7 +218,7 @@ from the HTTP endpoint. $ curl 34.83.9.133:8000/v2 ``` -Follow the [QuickStart](../../docs/quickstart.md) to get the example +Follow the [QuickStart](../../docs/getting_started/quickstart.md) to get the example image classification client that can be used to perform inferencing using image classification models being served by the inference server. For example, diff --git a/deploy/fleetcommand/README.md b/deploy/fleetcommand/README.md index cda8457ce5..88a05af34b 100644 --- a/deploy/fleetcommand/README.md +++ b/deploy/fleetcommand/README.md @@ -62,7 +62,7 @@ For this example you will place the model repository in an S3 Storage bucket $ aws s3 mb s3://triton-inference-server-repository ``` -Following the [QuickStart](../../docs/quickstart.md) download the example model +Following the [QuickStart](../../docs/getting_started/quickstart.md) download the example model repository to your system and copy it into the AWS S3 bucket. ``` @@ -136,7 +136,7 @@ location has the IP `34.83.9.133`: $ curl 34.83.9.133:30343/v2 ``` -Follow the [QuickStart](../../docs/quickstart.md) to get the example image +Follow the [QuickStart](../../docs/getting_started/quickstart.md) to get the example image classification client that can be used to perform inferencing using image classification models being served by the Triton. For example, diff --git a/deploy/gcp/README.md b/deploy/gcp/README.md index 67baa1970d..0530df412e 100644 --- a/deploy/gcp/README.md +++ b/deploy/gcp/README.md @@ -103,7 +103,7 @@ in a Google Cloud Storage bucket. $ gsutil mb gs://triton-inference-server-repository ``` -Following the [QuickStart](../../docs/quickstart.md) download the +Following the [QuickStart](../../docs/getting_started/quickstart.md) download the example model repository to your system and copy it into the GCS bucket. @@ -256,7 +256,7 @@ from the HTTP endpoint. $ curl 34.83.9.133:8000/v2 ``` -Follow the [QuickStart](../../docs/quickstart.md) to get the example +Follow the [QuickStart](../../docs/getting_started/quickstart.md) to get the example image classification client that can be used to perform inferencing using image classification models being served by the inference server. For example, diff --git a/deploy/gke-marketplace-app/README.md b/deploy/gke-marketplace-app/README.md index 1dd9302f79..e0cf652fa9 100644 --- a/deploy/gke-marketplace-app/README.md +++ b/deploy/gke-marketplace-app/README.md @@ -29,11 +29,12 @@ # NVIDIA Triton Inference Server GKE Marketplace Application **Table Of Contents** -- [Description](#description) -- [Prerequisites](#prerequisites) -- [Demo Instruction](#demo-instruction) -- [Additional Resources](#additional-resources) -- [Known Issues](#known-issues) +- [NVIDIA Triton Inference Server GKE Marketplace Application](#nvidia-triton-inference-server-gke-marketplace-application) + - [Description](#description) + - [Prerequisites](#prerequisites) + - [Demo Instruction](#demo-instruction) + - [Additional Resources](#additional-resources) + - [Known Issues](#known-issues) ## Description @@ -145,7 +146,7 @@ The client example push about ~650 QPS(Query per second) to Triton Server, and w ![Locust Client Chart](client.png) -Alternatively, user can opt to use [Perf Analyzer](https://github.com/triton-inference-server/server/blob/master/docs/perf_analyzer.md) to profile and study the performance of Triton Inference Server. Here we also provide a [client script](https://github.com/triton-inference-server/server/tree/master/deploy/gke-marketplace-app/client-sample/perf_analyzer_grpc.sh) to use Perf Analyzer to send gRPC to Triton Server GKE deployment. Perf Analyzer client requires user to use NGC Triton Client Container. +Alternatively, user can opt to use [Perf Analyzer](https://github.com/triton-inference-server/server/blob/master/docs/user_guide/perf_analyzer.md) to profile and study the performance of Triton Inference Server. Here we also provide a [client script](https://github.com/triton-inference-server/server/tree/master/deploy/gke-marketplace-app/client-sample/perf_analyzer_grpc.sh) to use Perf Analyzer to send gRPC to Triton Server GKE deployment. Perf Analyzer client requires user to use NGC Triton Client Container. ``` bash perf_analyzer_grpc.sh ${INGRESS_HOST}:${INGRESS_PORT} diff --git a/deploy/k8s-onprem/README.md b/deploy/k8s-onprem/README.md index 935e6fd82d..fcc48b1028 100644 --- a/deploy/k8s-onprem/README.md +++ b/deploy/k8s-onprem/README.md @@ -112,10 +112,10 @@ $ git clone https://github.com/triton-inference-server/server.git Triton Server needs a repository of models that it will make available for inferencing. For this example, we are using an existing NFS server and placing our model files there. See the -[Model Repository documentation](../../docs/model_repository.md) for other +[Model Repository documentation](../../docs/user_guide/model_repository.md) for other supported locations. -Following the [QuickStart](../../docs/quickstart.md), download the +Following the [QuickStart](../../docs/getting_started/quickstart.md), download the example model repository to your system and copy it onto your NFS server. Then, add the url or IP address of your NFS server and the server path of your model repository to `values.yaml`. @@ -237,7 +237,7 @@ $ helm install example -f config.yaml . ## Using Triton Inference Server Now that the inference server is running you can send HTTP or GRPC -requests to it to perform inferencing. By default, this chart deploys [Traefik](traefik.io) +requests to it to perform inferencing. By default, this chart deploys [Traefik](https://traefik.io/) and uses [IngressRoutes](https://doc.traefik.io/traefik/providers/kubernetes-crd/) to balance requests across all available nodes. @@ -267,7 +267,7 @@ from the HTTP endpoint. $ curl $cluster_ip:8000/v2 ``` -Follow the [QuickStart](../../docs/quickstart.md) to get the example +Follow the [QuickStart](../../docs/getting_started/quickstart.md) to get the example image classification client that can be used to perform inferencing using image classification models on the inference server. For example, @@ -284,7 +284,7 @@ Image 'images/mug.jpg': ## Testing Load Balancing and Autoscaling After you have confirmed that your Triton cluster is operational and can perform inference, you can test the load balancing and autoscaling features by sending a heavy load of requests. -One option for doing this is using the [perf_analyzer](../../docs/perf_analyzer.md) application. +One option for doing this is using the [perf_analyzer](../../docs/user_guide/perf_analyzer.md) application. You can apply a progressively increasing load with a command like: ``` diff --git a/deploy/mlflow-triton-plugin/README.md b/deploy/mlflow-triton-plugin/README.md index 6c8827254b..f8065d96ac 100644 --- a/deploy/mlflow-triton-plugin/README.md +++ b/deploy/mlflow-triton-plugin/README.md @@ -66,7 +66,7 @@ OUTPUT1 is the element-wise subtraction of INPUT0 and INPUT1. ### Start Triton Inference Server in EXPLICIT mode The MLflow Triton plugin must work with a running Triton server, see -[documentation](https://github.com/triton-inference-server/server/blob/main/docs/quickstart.md) +[documentation](https://github.com/triton-inference-server/server/blob/main/docs/getting_started/quickstart.md) of Triton Inference Server for how to start the server. Note that the server should be run in EXPLICIT mode (`--model-control-mode=explicit`) to exploit the deployment feature of the plugin. @@ -83,7 +83,7 @@ can interact with the server properly: The MLFlow ONNX built-in functionalities can be used to publish `onnx` flavor models to MLFlow directly, and the MLFlow Triton plugin will prepare the model to the format expected by Triton. You may also log -[`config.pbtxt`](](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_model_configuration.md)) +[`config.pbtxt`](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_model_configuration.md) as additonal artifact which Triton will be used to serve the model. Otherwise, the server should be run with auto-complete feature enabled (`--strict-model-config=false`) to generate the model configuration. @@ -101,7 +101,7 @@ For other model frameworks that Triton supports but not yet recognized by the MLFlow Triton plugin, the `publish_model_to_mlflow.py` script can be used to publish `triton` flavor models to MLflow. A `triton` flavor model is a directory containing the model files following the -[model layout](https://github.com/triton-inference-server/server/blob/main/docs/model_repository.md#repository-layout). +[model layout](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_repository.md#repository-layout). Below is an example usage: ``` diff --git a/docs/README.md b/docs/README.md index 5aa0b76acd..3fefa20436 100644 --- a/docs/README.md +++ b/docs/README.md @@ -52,7 +52,7 @@ Where \ is the version of Triton that you want to pull. For a complete l This guide covers the simplest possible workflow for deploying a model using a Triton Inference Server. - [Create a Model Repository](getting_started/quickstart.md#create-a-model-repository) - [Launch Triton](getting_started/quickstart.md#launch-triton) -- [Send an Inference Request](getting_started/quickstart.md#sending-an-inference-request) +- [Send an Inference Request](getting_started/quickstart.md#send-an-inference-request) Triton Inference Server has a considerable list versatile and powerful features. All new users are recommended to explore the [User Guide](README.md#user-guide) and the [additional resources](README.md#resources) sections for features most relevant to their use case. @@ -122,7 +122,7 @@ Triton supports batching individual inference requests to improve compute resour - [Stateful Models](user_guide/architecture.md#stateful-models) - [Control Inputs](user_guide/architecture.md#control-inputs) - [Implicit State - Stateful Inference Using a Stateless Model](user_guide/architecture.md#implicit-state-management) - - [Sequence Scheduling Strategies](user_guide/architecture.md#scheduling-strateties) + - [Sequence Scheduling Strategies](user_guide/architecture.md#scheduling-strategies) - [Direct](user_guide/architecture.md#direct) - [Oldest](user_guide/architecture.md#oldest) diff --git a/docs/examples/jetson/README.md b/docs/examples/jetson/README.md index fcd28e6c59..57a5649fd1 100644 --- a/docs/examples/jetson/README.md +++ b/docs/examples/jetson/README.md @@ -51,7 +51,7 @@ Inference Server as a shared library. ## Part 2. Analyzing model performance with perf_analyzer -To analyze model performance on Jetson, [perf_analyzer](https://github.com/triton-inference-server/server/blob/main/docs/perf_analyzer.md) tool is used. The `perf_analyzer` is included in the release tar file or can be compiled from source. +To analyze model performance on Jetson, [perf_analyzer](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/perf_analyzer.md) tool is used. The `perf_analyzer` is included in the release tar file or can be compiled from source. From this directory of the repository, execute the following to evaluate model performance: @@ -59,4 +59,4 @@ From this directory of the repository, execute the following to evaluate model p ./perf_analyzer -m peoplenet -b 2 --service-kind=triton_c_api --model-repo=$(pwd)/concurrency_and_dynamic_batching/trtis_model_repo_sample_1 --triton-server-directory=/opt/tritonserver --concurrency-range 1:6 -f perf_c_api.csv ``` -In the example above we saved the results as a `.csv` file. To visualize these results, follow the steps described [here](https://github.com/triton-inference-server/server/blob/main/docs/perf_analyzer.md#visualizing-latency-vs-throughput). +In the example above we saved the results as a `.csv` file. To visualize these results, follow the steps described [here](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/perf_analyzer.md#visualizing-latency-vs-throughput). diff --git a/docs/examples/jetson/concurrency_and_dynamic_batching/README.md b/docs/examples/jetson/concurrency_and_dynamic_batching/README.md index 310f58a9c0..ad3c473dfb 100644 --- a/docs/examples/jetson/concurrency_and_dynamic_batching/README.md +++ b/docs/examples/jetson/concurrency_and_dynamic_batching/README.md @@ -326,6 +326,6 @@ dynamic_batching { } ``` -To try further options of dynamic batcher see the [documentation](https://github.com/triton-inference-server/server/blob/main/docs/model_configuration.md#dynamic-batcher). +To try further options of dynamic batcher see the [documentation](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#dynamic-batcher). You can also try enabling both concurrent model execution and dynamic batching. \ No newline at end of file diff --git a/docs/getting_started/quickstart.md b/docs/getting_started/quickstart.md index 228bfe40bd..4abb9646d4 100644 --- a/docs/getting_started/quickstart.md +++ b/docs/getting_started/quickstart.md @@ -125,7 +125,7 @@ $ curl -v localhost:8000/v2/health/ready The HTTP request returns status 200 if Triton is ready and non-200 if it is not ready. -## Send an Infernce Request +## Send an Inference Request Use docker pull to get the client libraries and examples image from NGC. diff --git a/docs/user_guide/architecture.md b/docs/user_guide/architecture.md index f897708358..094eb8fe0c 100644 --- a/docs/user_guide/architecture.md +++ b/docs/user_guide/architecture.md @@ -311,8 +311,9 @@ description of the model contains variable-sized dimensions, Triton will use *1* for every variable-sized dimension for the starting request. For other non-starting requests in the sequence, the input state is the output state of the previous request in the sequence. For an example ONNX model that uses -implicit state you can refer to -[this ONNX model](../../qa/common/gen_qa_implicit_models.py#L101). +implicit state you can refer to this onnx model generated from the +`create_onnx_modelfile_wo_initial_state()` +[from this generation script](../../qa/common/gen_qa_implicit_models.py). This is a simple accumulator model that stores the partial sum of the requests in a sequence in Triton using implicit state. For state initialization, if the request is starting, the model sets the "OUTPUT\_STATE" to be equal to the diff --git a/docs/user_guide/faq.md b/docs/user_guide/faq.md index 46e63e5e55..92b14ecb1a 100644 --- a/docs/user_guide/faq.md +++ b/docs/user_guide/faq.md @@ -35,7 +35,7 @@ same as when using the model's framework directly. However, with Triton you get benefits like [concurrent model execution](architecture.md#concurrent-model-execution) (the ability to run multiple models at the same time on the same GPU) and [dynamic -batching](architecture.md#dynamic-batcher) to get better +batching](model_configuration.md#dynamic-batcher) to get better throughput. You can also [replace or upgrade models while Triton and client application are running](model_management.md). Another benefit is that Triton can be deployed as a Docker container, anywhere – on @@ -84,7 +84,7 @@ library to suit your specific needs. In an AWS environment, the Triton Inference Server docker container can run on [CPU-only instances or GPU compute -instances](../getting_started/quickstart.md#run-triton). Triton can run directly on the +instances](../getting_started/quickstart.md#launch-triton). Triton can run directly on the compute instance or inside Elastic Kubernetes Service (EKS). In addition, other AWS services such as Elastic Load Balancer (ELB) can be used for load balancing traffic among multiple Triton @@ -121,13 +121,13 @@ concurrency](model_configuration.md#instance-groups) on a model-by-model basis. * Triton can [batch together multiple inference requests into a single - inference execution](architecture.md#dynamic-batcher). Typically, + inference execution](model_configuration.md#dynamic-batcher). Typically, batching inference requests leads to much higher thoughput with only a relatively small increase in latency. As a general rule, batching is the most beneficial way to increase GPU utilization. So you should always try enabling the [dynamic -batcher](architecture.md#dynamic-batcher) with your models. Using +batcher](model_configuration.md#dynamic-batcher) with your models. Using multiple instances of a model can also provide some benefit but is typically most useful for models that have small compute requirements. Most models will benefit from using two instances but diff --git a/docs/user_guide/model_configuration.md b/docs/user_guide/model_configuration.md index e57a284e06..b22fa1a77a 100644 --- a/docs/user_guide/model_configuration.md +++ b/docs/user_guide/model_configuration.md @@ -867,7 +867,7 @@ maximum batch size allowed by the model (but see the following section for the delay option that changes this behavior). The size of generated batches can be examined in aggregate using -[count metrics](metrics.md#count-metrics). +[count metrics](metrics.md#inference-request-metrics). #### Delayed Batching diff --git a/docs/user_guide/optimization.md b/docs/user_guide/optimization.md index 2b37259687..a7b912bf36 100644 --- a/docs/user_guide/optimization.md +++ b/docs/user_guide/optimization.md @@ -80,7 +80,7 @@ latency. For most models, the Triton feature that provides the largest performance improvement is [dynamic -batching](architecture.md#dynamic-batcher). If your model does not +batching](model_configuration.md#dynamic-batcher). If your model does not support batching then you can skip ahead to [Model Instances](#model-instances). diff --git a/qa/L0_doc_links/mkdocs.yml b/qa/L0_doc_links/mkdocs.yml new file mode 100644 index 0000000000..c2dfbea67c --- /dev/null +++ b/qa/L0_doc_links/mkdocs.yml @@ -0,0 +1,20 @@ +# Copyright (c) 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +site_name: CI Test +use_directory_urls: False +docs_dir: "./repos" +plugins: + - htmlproofer + - search diff --git a/qa/L0_doc_links/test.sh b/qa/L0_doc_links/test.sh new file mode 100644 index 0000000000..8a42d3748c --- /dev/null +++ b/qa/L0_doc_links/test.sh @@ -0,0 +1,51 @@ +# Copyright (c) 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +LOG="`pwd`/log.txt" +CONFIG="`pwd`/mkdocs.yml" +RET=0 +# Download necessary packages +python3 -m pip install mkdocs +python3 -m pip install mkdocs-htmlproofer-plugin + +# Get the necessary repos +mkdir repos && cd repos +TRITON_BACKEND_REPO_TAG=${TRITON_BACKEND_REPO_TAG:="main"} +echo ${TRITON_BACKEND_REPO_TAG} +git clone --single-branch --depth=1 -b ${TRITON_BACKEND_REPO_TAG} https://github.com/triton-inference-server/backend.git +cd .. + +exec mkdocs serve -f $CONFIG > $LOG & +PID=$! +# Time for the compilation to finish. This needs to be increased if other repos +# are added to the test +sleep 20 + +until [[ (-z `pgrep mkdocs`) ]]; do + kill -2 $PID + sleep 2 +done + +if [[ ! -z `grep "invalid url" $LOG` ]]; then + cat $LOG + RET=1 +fi + + +if [ $RET -eq 0 ]; then + echo -e "\n***\n*** Test PASSED\n***" +else + echo -e "\n***\n*** Test FAILED\n***" +fi +# exit $RET