fix broken links in server (triton-inference-server#4926)

fix broken links in documentation and add tests for backend
Spartee · Oct 5, 2022 · a2240c5 · a2240c5
1 parent d293325
commit a2240c5
Show file tree

Hide file tree

Showing 18 changed files with 113 additions and 40 deletions.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -47,7 +47,7 @@ proposed change so that the Triton team can provide feedback.
   will provide guidance about how and where your enhancement should be
   implemented.
 
-- [Testing](docs/test.md) is a critical part of any Triton
+- [Testing](docs/customization_guide/test.md) is a critical part of any Triton
   enhancement. You should plan on spending significant time on
   creating tests for your change. The Triton team will help you to
   design your testing so that it is compatible with existing testing
@@ -84,7 +84,7 @@ proposed change so that the Triton team can provide feedback.
 - Make sure all `L0_*` tests pass:
 
   - In the `qa/` directory, there are basic sanity tests scripted in
-    directories named `L0_...`.  See the [Test](docs/test.md)
+    directories named `L0_...`.  See the [Test](docs/customization_guide/test.md)
     documentation for instructions on running these tests.
 
 - Triton Inference Server's default build assumes recent versions of

diff --git a/README.md b/README.md
@@ -125,8 +125,8 @@ contains additional documentation, presentations, and examples.
 The recommended way to build and use Triton Inference Server is with Docker
 images.
 
-- [Install Triton Inference Server with Docker containers](docs/customization_guide/build.md#building-triton-with-docker) (*Recommended*)
-- [Install Triton Inference Server without Docker containers](docs/customization_guide/build.md#building-triton-without-docker)
+- [Install Triton Inference Server with Docker containers](docs/customization_guide/build.md#building-with-docker) (*Recommended*)
+- [Install Triton Inference Server without Docker containers](docs/customization_guide/build.md#building-without-docker)
 - [Build a custom Triton Inference Server Docker container](docs/customization_guide/compose.md)
 - [Build Triton Inference Server from source](docs/customization_guide/build.md#building-on-unsupported-platforms)
 - [Build Triton Inference Server for Windows 10](docs/customization_guide/build.md#building-for-windows-10)
@@ -213,8 +213,8 @@ designed for modularity and flexibility
 ### Additional Documentation
 
 - [FAQ](docs/user_guide/faq.md)
-- [User Guide](docs#user-guide)
-- [Developer Guide](docs#developer-guide)
+- [User Guide](docs/README.md#user-guide)
+- [Customization Guide](docs/README.md#customization-guide)
 - [Release Notes](https://docs.nvidia.com/deeplearning/triton-inference-server/release-notes/index.html)
 - [GPU, Driver, and CUDA Support
 Matrix](https://docs.nvidia.com/deeplearning/dgx/support-matrix/index.html)

diff --git a/deploy/aws/README.md b/deploy/aws/README.md
@@ -98,7 +98,7 @@ in an AWS S3 Storage bucket.
 $ aws s3 mb s3://triton-inference-server-repository
 ```
 
-Following the [QuickStart](../../docs/quickstart.md) download the
+Following the [QuickStart](../../docs/getting_started/quickstart.md) download the
 example model repository to your system and copy it into the AWS S3
 bucket.
 
@@ -218,7 +218,7 @@ from the HTTP endpoint.
 $ curl 34.83.9.133:8000/v2
 ```
 
-Follow the [QuickStart](../../docs/quickstart.md) to get the example
+Follow the [QuickStart](../../docs/getting_started/quickstart.md) to get the example
 image classification client that can be used to perform inferencing
 using image classification models being served by the inference
 server. For example,

diff --git a/deploy/fleetcommand/README.md b/deploy/fleetcommand/README.md
@@ -62,7 +62,7 @@ For this example you will place the model repository in an S3 Storage bucket
 $ aws s3 mb s3://triton-inference-server-repository
 ```
 
-Following the [QuickStart](../../docs/quickstart.md) download the example model
+Following the [QuickStart](../../docs/getting_started/quickstart.md) download the example model
 repository to your system and copy it into the AWS S3 bucket.
 
 ```
@@ -136,7 +136,7 @@ location has the IP `34.83.9.133`:
 $ curl 34.83.9.133:30343/v2
 ```
 
-Follow the [QuickStart](../../docs/quickstart.md) to get the example image
+Follow the [QuickStart](../../docs/getting_started/quickstart.md) to get the example image
 classification client that can be used to perform inferencing using image
 classification models being served by the Triton. For example,
 

diff --git a/deploy/gcp/README.md b/deploy/gcp/README.md
@@ -103,7 +103,7 @@ in a Google Cloud Storage bucket.
 $ gsutil mb gs://triton-inference-server-repository
 ```
 
-Following the [QuickStart](../../docs/quickstart.md) download the
+Following the [QuickStart](../../docs/getting_started/quickstart.md) download the
 example model repository to your system and copy it into the GCS
 bucket.
 
@@ -256,7 +256,7 @@ from the HTTP endpoint.
 $ curl 34.83.9.133:8000/v2
 ```
 
-Follow the [QuickStart](../../docs/quickstart.md) to get the example
+Follow the [QuickStart](../../docs/getting_started/quickstart.md) to get the example
 image classification client that can be used to perform inferencing
 using image classification models being served by the inference
 server. For example,

diff --git a/deploy/gke-marketplace-app/README.md b/deploy/gke-marketplace-app/README.md
@@ -29,11 +29,12 @@
 # NVIDIA Triton Inference Server GKE Marketplace Application
 
 **Table Of Contents**
-- [Description](#description)
-- [Prerequisites](#prerequisites)
-- [Demo Instruction](#demo-instruction)
-- [Additional Resources](#additional-resources)
-- [Known Issues](#known-issues)
+- [NVIDIA Triton Inference Server GKE Marketplace Application](#nvidia-triton-inference-server-gke-marketplace-application)
+  - [Description](#description)
+  - [Prerequisites](#prerequisites)
+  - [Demo Instruction](#demo-instruction)
+  - [Additional Resources](#additional-resources)
+  - [Known Issues](#known-issues)
 
 ## Description
 
@@ -145,7 +146,7 @@ The client example push about ~650 QPS(Query per second) to Triton Server, and w
 
 ![Locust Client Chart](client.png)
 
-Alternatively, user can opt to use [Perf Analyzer](https://github.com/triton-inference-server/server/blob/master/docs/perf_analyzer.md) to profile and study the performance of Triton Inference Server. Here we also provide a [client script](https://github.com/triton-inference-server/server/tree/master/deploy/gke-marketplace-app/client-sample/perf_analyzer_grpc.sh) to use Perf Analyzer to send gRPC to Triton Server GKE deployment. Perf Analyzer client requires user to use NGC Triton Client Container.
+Alternatively, user can opt to use [Perf Analyzer](https://github.com/triton-inference-server/server/blob/master/docs/user_guide/perf_analyzer.md) to profile and study the performance of Triton Inference Server. Here we also provide a [client script](https://github.com/triton-inference-server/server/tree/master/deploy/gke-marketplace-app/client-sample/perf_analyzer_grpc.sh) to use Perf Analyzer to send gRPC to Triton Server GKE deployment. Perf Analyzer client requires user to use NGC Triton Client Container.
 
 ```
 bash perf_analyzer_grpc.sh ${INGRESS_HOST}:${INGRESS_PORT}

diff --git a/deploy/k8s-onprem/README.md b/deploy/k8s-onprem/README.md
@@ -112,10 +112,10 @@ $ git clone https://github.com/triton-inference-server/server.git
 Triton Server needs a repository of models that it will make available
 for inferencing. For this example, we are using an existing NFS server and
 placing our model files there. See the
-[Model Repository documentation](../../docs/model_repository.md) for other
+[Model Repository documentation](../../docs/user_guide/model_repository.md) for other
 supported locations.
 
-Following the [QuickStart](../../docs/quickstart.md), download the
+Following the [QuickStart](../../docs/getting_started/quickstart.md), download the
 example model repository to your system and copy it onto your NFS server.
 Then, add the url or IP address of your NFS server and the server path of your
 model repository to `values.yaml`.
@@ -237,7 +237,7 @@ $ helm install example -f config.yaml .
 ## Using Triton Inference Server
 
 Now that the inference server is running you can send HTTP or GRPC
-requests to it to perform inferencing. By default, this chart deploys [Traefik](traefik.io)
+requests to it to perform inferencing. By default, this chart deploys [Traefik](https://traefik.io/)
 and uses [IngressRoutes](https://doc.traefik.io/traefik/providers/kubernetes-crd/)
 to balance requests across all available nodes.
 
@@ -267,7 +267,7 @@ from the HTTP endpoint.
 $ curl $cluster_ip:8000/v2
 ```
 
-Follow the [QuickStart](../../docs/quickstart.md) to get the example
+Follow the [QuickStart](../../docs/getting_started/quickstart.md) to get the example
 image classification client that can be used to perform inferencing
 using image classification models on the inference
 server. For example,
@@ -284,7 +284,7 @@ Image 'images/mug.jpg':
 ## Testing Load Balancing and Autoscaling
 After you have confirmed that your Triton cluster is operational and can perform inference,
 you can test the load balancing and autoscaling features by sending a heavy load of requests.
-One option for doing this is using the [perf_analyzer](../../docs/perf_analyzer.md) application.
+One option for doing this is using the [perf_analyzer](../../docs/user_guide/perf_analyzer.md) application.
 
 You can apply a progressively increasing load with a command like:
 ```

diff --git a/deploy/mlflow-triton-plugin/README.md b/deploy/mlflow-triton-plugin/README.md
@@ -66,7 +66,7 @@ OUTPUT1 is the element-wise subtraction of INPUT0 and INPUT1.
 ### Start Triton Inference Server in EXPLICIT mode
 
 The MLflow Triton plugin must work with a running Triton server, see
-[documentation](https://github.com/triton-inference-server/server/blob/main/docs/quickstart.md)
+[documentation](https://github.com/triton-inference-server/server/blob/main/docs/getting_started/quickstart.md)
 of Triton Inference Server for how to start the server. Note that
 the server should be run in EXPLICIT mode (`--model-control-mode=explicit`)
 to exploit the deployment feature of the plugin.
@@ -83,7 +83,7 @@ can interact with the server properly:
 The MLFlow ONNX built-in functionalities can be used to publish `onnx` flavor
 models to MLFlow directly, and the MLFlow Triton plugin will prepare the model
 to the format expected by Triton. You may also log
-[`config.pbtxt`](](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_model_configuration.md))
+[`config.pbtxt`](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_model_configuration.md)
 as additonal artifact which Triton will be used to serve the model. Otherwise,
 the server should be run with auto-complete feature enabled
 (`--strict-model-config=false`) to generate the model configuration.
@@ -101,7 +101,7 @@ For other model frameworks that Triton supports but not yet recognized by
 the MLFlow Triton plugin, the `publish_model_to_mlflow.py` script can be used to
 publish `triton` flavor models to MLflow. A `triton` flavor model is a directory
 containing the model files following the
-[model layout](https://github.com/triton-inference-server/server/blob/main/docs/model_repository.md#repository-layout).
+[model layout](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_repository.md#repository-layout).
 Below is an example usage:
 
 ```

diff --git a/docs/README.md b/docs/README.md
@@ -52,7 +52,7 @@ Where \<yy.mm\> is the version of Triton that you want to pull. For a complete l
 This guide covers the simplest possible workflow for deploying a model using a Triton Inference Server.
 - [Create a Model Repository](getting_started/quickstart.md#create-a-model-repository)
 - [Launch Triton](getting_started/quickstart.md#launch-triton)
-- [Send an Inference Request](getting_started/quickstart.md#sending-an-inference-request)
+- [Send an Inference Request](getting_started/quickstart.md#send-an-inference-request)
 
 Triton Inference Server has a considerable list versatile and powerful features. All new users are recommended to explore the [User Guide](README.md#user-guide) and the [additional resources](README.md#resources) sections for features most relevant to their use case. 
 
@@ -122,7 +122,7 @@ Triton supports batching individual inference requests to improve compute resour
   - [Stateful Models](user_guide/architecture.md#stateful-models)
   - [Control Inputs](user_guide/architecture.md#control-inputs)
   - [Implicit State - Stateful Inference Using a Stateless Model](user_guide/architecture.md#implicit-state-management)
-  - [Sequence Scheduling Strategies](user_guide/architecture.md#scheduling-strateties)
+  - [Sequence Scheduling Strategies](user_guide/architecture.md#scheduling-strategies)
     - [Direct](user_guide/architecture.md#direct)
     - [Oldest](user_guide/architecture.md#oldest)
 

diff --git a/docs/examples/jetson/README.md b/docs/examples/jetson/README.md
@@ -51,12 +51,12 @@ Inference Server as a shared library.
 
 ## Part 2. Analyzing model performance with perf_analyzer
 
-To analyze model performance on Jetson, [perf_analyzer](https://github.com/triton-inference-server/server/blob/main/docs/perf_analyzer.md) tool is used. The `perf_analyzer` is included in the release tar file or can be compiled from source.
+To analyze model performance on Jetson, [perf_analyzer](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/perf_analyzer.md) tool is used. The `perf_analyzer` is included in the release tar file or can be compiled from source.
 
 From this directory of the repository, execute the following to evaluate model performance:
 
 ```shell
 ./perf_analyzer -m peoplenet -b 2 --service-kind=triton_c_api --model-repo=$(pwd)/concurrency_and_dynamic_batching/trtis_model_repo_sample_1 --triton-server-directory=/opt/tritonserver --concurrency-range 1:6 -f perf_c_api.csv
 ```
 
-In the example above we saved the results as a `.csv` file. To visualize these results, follow the steps described [here](https://github.com/triton-inference-server/server/blob/main/docs/perf_analyzer.md#visualizing-latency-vs-throughput).
+In the example above we saved the results as a `.csv` file. To visualize these results, follow the steps described [here](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/perf_analyzer.md#visualizing-latency-vs-throughput).
diff --git a/docs/examples/jetson/concurrency_and_dynamic_batching/README.md b/docs/examples/jetson/concurrency_and_dynamic_batching/README.md
@@ -326,6 +326,6 @@ dynamic_batching {
 }
 ```
 
-To try further options of dynamic batcher see the [documentation](https://github.com/triton-inference-server/server/blob/main/docs/model_configuration.md#dynamic-batcher). 
+To try further options of dynamic batcher see the [documentation](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#dynamic-batcher). 
 
 You can also try enabling both concurrent model execution and dynamic batching.
diff --git a/docs/getting_started/quickstart.md b/docs/getting_started/quickstart.md
@@ -125,7 +125,7 @@ $ curl -v localhost:8000/v2/health/ready
 The HTTP request returns status 200 if Triton is ready and non-200 if
 it is not ready.
 
-## Send an Infernce Request
+## Send an Inference Request
 
 Use docker pull to get the client libraries and examples image
 from NGC.

diff --git a/docs/user_guide/architecture.md b/docs/user_guide/architecture.md
@@ -311,8 +311,9 @@ description of the model contains variable-sized dimensions, Triton will use *1*
 for every variable-sized dimension for the starting request. For other
 non-starting requests in the sequence, the input state is the output state of
 the previous request in the sequence. For an example ONNX model that uses
-implicit state you can refer to
-[this ONNX model](../../qa/common/gen_qa_implicit_models.py#L101).
+implicit state you can refer to this onnx model generated from the 
+`create_onnx_modelfile_wo_initial_state()`
+[from this generation script](../../qa/common/gen_qa_implicit_models.py).
 This is a simple accumulator model that stores the partial sum of the requests
 in a sequence in Triton using implicit state. For state initialization, if the
 request is starting, the model sets the "OUTPUT\_STATE" to be equal to the

diff --git a/docs/user_guide/faq.md b/docs/user_guide/faq.md
@@ -35,7 +35,7 @@ same as when using the model's framework directly. However, with
 Triton you get benefits like [concurrent model
 execution](architecture.md#concurrent-model-execution) (the ability to
 run multiple models at the same time on the same GPU) and [dynamic
-batching](architecture.md#dynamic-batcher) to get better
+batching](model_configuration.md#dynamic-batcher) to get better
 throughput. You can also [replace or upgrade models while Triton and
 client application are running](model_management.md). Another benefit
 is that Triton can be deployed as a Docker container, anywhere – on
@@ -84,7 +84,7 @@ library to suit your specific needs.
 
 In an AWS environment, the Triton Inference Server docker container
 can run on [CPU-only instances or GPU compute
-instances](../getting_started/quickstart.md#run-triton). Triton can run directly on the
+instances](../getting_started/quickstart.md#launch-triton). Triton can run directly on the
 compute instance or inside Elastic Kubernetes Service (EKS). In
 addition, other AWS services such as Elastic Load Balancer (ELB) can
 be used for load balancing traffic among multiple Triton
@@ -121,13 +121,13 @@ concurrency](model_configuration.md#instance-groups) on a
 model-by-model basis.
 
 * Triton can [batch together multiple inference requests into a single
-  inference execution](architecture.md#dynamic-batcher). Typically,
+  inference execution](model_configuration.md#dynamic-batcher). Typically,
   batching inference requests leads to much higher thoughput with only
   a relatively small increase in latency.
 
 As a general rule, batching is the most beneficial way to increase GPU
 utilization. So you should always try enabling the [dynamic
-batcher](architecture.md#dynamic-batcher) with your models. Using
+batcher](model_configuration.md#dynamic-batcher) with your models. Using
 multiple instances of a model can also provide some benefit but is
 typically most useful for models that have small compute
 requirements. Most models will benefit from using two instances but

diff --git a/docs/user_guide/model_configuration.md b/docs/user_guide/model_configuration.md
@@ -867,7 +867,7 @@ maximum batch size allowed by the model (but see the following section
 for the delay option that changes this behavior).
 
 The size of generated batches can be examined in aggregate using
-[count metrics](metrics.md#count-metrics).
+[count metrics](metrics.md#inference-request-metrics).
 
 #### Delayed Batching
 

diff --git a/docs/user_guide/optimization.md b/docs/user_guide/optimization.md
@@ -80,7 +80,7 @@ latency.
 
 For most models, the Triton feature that provides the largest
 performance improvement is [dynamic
-batching](architecture.md#dynamic-batcher). If your model does not
+batching](model_configuration.md#dynamic-batcher). If your model does not
 support batching then you can skip ahead to [Model
 Instances](#model-instances).
 

diff --git a/qa/L0_doc_links/mkdocs.yml b/qa/L0_doc_links/mkdocs.yml
@@ -0,0 +1,20 @@
+# Copyright (c) 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+site_name: CI Test
+use_directory_urls: False
+docs_dir: "./repos"
+plugins:
+        - htmlproofer
+        - search
diff --git a/qa/L0_doc_links/test.sh b/qa/L0_doc_links/test.sh
@@ -0,0 +1,51 @@
+# Copyright (c) 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+LOG="`pwd`/log.txt"
+CONFIG="`pwd`/mkdocs.yml"
+RET=0
+# Download necessary packages
+python3 -m pip install mkdocs
+python3 -m pip install mkdocs-htmlproofer-plugin
+
+# Get the necessary repos
+mkdir repos && cd repos
+TRITON_BACKEND_REPO_TAG=${TRITON_BACKEND_REPO_TAG:="main"}
+echo ${TRITON_BACKEND_REPO_TAG}
+git clone --single-branch --depth=1 -b ${TRITON_BACKEND_REPO_TAG} https://github.com/triton-inference-server/backend.git
+cd ..
+
+exec mkdocs serve -f $CONFIG > $LOG &
+PID=$!
+# Time for the compilation to finish. This needs to be increased if other repos
+# are added to the test
+sleep 20
+
+until [[ (-z `pgrep mkdocs`) ]]; do
+    kill -2 $PID
+    sleep 2
+done
+
+if [[ ! -z `grep "invalid url" $LOG` ]]; then
+    cat $LOG
+    RET=1
+fi
+
+
+if [ $RET -eq 0 ]; then
+    echo -e "\n***\n*** Test PASSED\n***"
+else
+    echo -e "\n***\n*** Test FAILED\n***"
+fi
+# exit $RET