Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix broken links in server #4926

Merged
merged 5 commits into from
Oct 5, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ proposed change so that the Triton team can provide feedback.
will provide guidance about how and where your enhancement should be
implemented.

- [Testing](docs/test.md) is a critical part of any Triton
- [Testing](docs/customization_guide/test.md) is a critical part of any Triton
enhancement. You should plan on spending significant time on
creating tests for your change. The Triton team will help you to
design your testing so that it is compatible with existing testing
Expand Down Expand Up @@ -84,7 +84,7 @@ proposed change so that the Triton team can provide feedback.
- Make sure all `L0_*` tests pass:

- In the `qa/` directory, there are basic sanity tests scripted in
directories named `L0_...`. See the [Test](docs/test.md)
directories named `L0_...`. See the [Test](docs/customization_guide/test.md)
documentation for instructions on running these tests.

- Triton Inference Server's default build assumes recent versions of
Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,8 +125,8 @@ contains additional documentation, presentations, and examples.
The recommended way to build and use Triton Inference Server is with Docker
images.

- [Install Triton Inference Server with Docker containers](docs/customization_guide/build.md#building-triton-with-docker) (*Recommended*)
- [Install Triton Inference Server without Docker containers](docs/customization_guide/build.md#building-triton-without-docker)
- [Install Triton Inference Server with Docker containers](docs/customization_guide/build.md#building-with-docker) (*Recommended*)
- [Install Triton Inference Server without Docker containers](docs/customization_guide/build.md#building-without-docker)
- [Build a custom Triton Inference Server Docker container](docs/customization_guide/compose.md)
- [Build Triton Inference Server from source](docs/customization_guide/build.md#building-on-unsupported-platforms)
- [Build Triton Inference Server for Windows 10](docs/customization_guide/build.md#building-for-windows-10)
Expand Down Expand Up @@ -213,8 +213,8 @@ designed for modularity and flexibility
### Additional Documentation

- [FAQ](docs/user_guide/faq.md)
- [User Guide](docs#user-guide)
- [Developer Guide](docs#developer-guide)
- [User Guide](docs/README.md#user-guide)
- [Customization Guide](docs/README.md#customization-guide)
- [Release Notes](https://docs.nvidia.com/deeplearning/triton-inference-server/release-notes/index.html)
- [GPU, Driver, and CUDA Support
Matrix](https://docs.nvidia.com/deeplearning/dgx/support-matrix/index.html)
Expand Down
4 changes: 2 additions & 2 deletions deploy/aws/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ in an AWS S3 Storage bucket.
$ aws s3 mb s3://triton-inference-server-repository
```

Following the [QuickStart](../../docs/quickstart.md) download the
Following the [QuickStart](../../docs/getting_started/quickstart.md) download the
example model repository to your system and copy it into the AWS S3
bucket.

Expand Down Expand Up @@ -218,7 +218,7 @@ from the HTTP endpoint.
$ curl 34.83.9.133:8000/v2
```

Follow the [QuickStart](../../docs/quickstart.md) to get the example
Follow the [QuickStart](../../docs/getting_started/quickstart.md) to get the example
image classification client that can be used to perform inferencing
using image classification models being served by the inference
server. For example,
Expand Down
4 changes: 2 additions & 2 deletions deploy/fleetcommand/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ For this example you will place the model repository in an S3 Storage bucket
$ aws s3 mb s3://triton-inference-server-repository
```

Following the [QuickStart](../../docs/quickstart.md) download the example model
Following the [QuickStart](../../docs/getting_started/quickstart.md) download the example model
repository to your system and copy it into the AWS S3 bucket.

```
Expand Down Expand Up @@ -136,7 +136,7 @@ location has the IP `34.83.9.133`:
$ curl 34.83.9.133:30343/v2
```

Follow the [QuickStart](../../docs/quickstart.md) to get the example image
Follow the [QuickStart](../../docs/getting_started/quickstart.md) to get the example image
classification client that can be used to perform inferencing using image
classification models being served by the Triton. For example,

Expand Down
4 changes: 2 additions & 2 deletions deploy/gcp/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ in a Google Cloud Storage bucket.
$ gsutil mb gs://triton-inference-server-repository
```

Following the [QuickStart](../../docs/quickstart.md) download the
Following the [QuickStart](../../docs/getting_started/quickstart.md) download the
example model repository to your system and copy it into the GCS
bucket.

Expand Down Expand Up @@ -256,7 +256,7 @@ from the HTTP endpoint.
$ curl 34.83.9.133:8000/v2
```

Follow the [QuickStart](../../docs/quickstart.md) to get the example
Follow the [QuickStart](../../docs/getting_started/quickstart.md) to get the example
image classification client that can be used to perform inferencing
using image classification models being served by the inference
server. For example,
Expand Down
13 changes: 7 additions & 6 deletions deploy/gke-marketplace-app/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,12 @@
# NVIDIA Triton Inference Server GKE Marketplace Application

**Table Of Contents**
- [Description](#description)
- [Prerequisites](#prerequisites)
- [Demo Instruction](#demo-instruction)
- [Additional Resources](#additional-resources)
- [Known Issues](#known-issues)
- [NVIDIA Triton Inference Server GKE Marketplace Application](#nvidia-triton-inference-server-gke-marketplace-application)
- [Description](#description)
- [Prerequisites](#prerequisites)
- [Demo Instruction](#demo-instruction)
- [Additional Resources](#additional-resources)
- [Known Issues](#known-issues)

## Description

Expand Down Expand Up @@ -145,7 +146,7 @@ The client example push about ~650 QPS(Query per second) to Triton Server, and w

![Locust Client Chart](client.png)

Alternatively, user can opt to use [Perf Analyzer](https://github.com/triton-inference-server/server/blob/master/docs/perf_analyzer.md) to profile and study the performance of Triton Inference Server. Here we also provide a [client script](https://github.com/triton-inference-server/server/tree/master/deploy/gke-marketplace-app/client-sample/perf_analyzer_grpc.sh) to use Perf Analyzer to send gRPC to Triton Server GKE deployment. Perf Analyzer client requires user to use NGC Triton Client Container.
Alternatively, user can opt to use [Perf Analyzer](https://github.com/triton-inference-server/server/blob/master/docs/user_guide/perf_analyzer.md) to profile and study the performance of Triton Inference Server. Here we also provide a [client script](https://github.com/triton-inference-server/server/tree/master/deploy/gke-marketplace-app/client-sample/perf_analyzer_grpc.sh) to use Perf Analyzer to send gRPC to Triton Server GKE deployment. Perf Analyzer client requires user to use NGC Triton Client Container.

```
bash perf_analyzer_grpc.sh ${INGRESS_HOST}:${INGRESS_PORT}
Expand Down
10 changes: 5 additions & 5 deletions deploy/k8s-onprem/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,10 +112,10 @@ $ git clone https://github.com/triton-inference-server/server.git
Triton Server needs a repository of models that it will make available
for inferencing. For this example, we are using an existing NFS server and
placing our model files there. See the
[Model Repository documentation](../../docs/model_repository.md) for other
[Model Repository documentation](../../docs/user_guide/model_repository.md) for other
supported locations.

Following the [QuickStart](../../docs/quickstart.md), download the
Following the [QuickStart](../../docs/getting_started/quickstart.md), download the
example model repository to your system and copy it onto your NFS server.
Then, add the url or IP address of your NFS server and the server path of your
model repository to `values.yaml`.
Expand Down Expand Up @@ -237,7 +237,7 @@ $ helm install example -f config.yaml .
## Using Triton Inference Server

Now that the inference server is running you can send HTTP or GRPC
requests to it to perform inferencing. By default, this chart deploys [Traefik](traefik.io)
requests to it to perform inferencing. By default, this chart deploys [Traefik](https://traefik.io/)
and uses [IngressRoutes](https://doc.traefik.io/traefik/providers/kubernetes-crd/)
to balance requests across all available nodes.

Expand Down Expand Up @@ -267,7 +267,7 @@ from the HTTP endpoint.
$ curl $cluster_ip:8000/v2
```

Follow the [QuickStart](../../docs/quickstart.md) to get the example
Follow the [QuickStart](../../docs/getting_started/quickstart.md) to get the example
image classification client that can be used to perform inferencing
using image classification models on the inference
server. For example,
Expand All @@ -284,7 +284,7 @@ Image 'images/mug.jpg':
## Testing Load Balancing and Autoscaling
After you have confirmed that your Triton cluster is operational and can perform inference,
you can test the load balancing and autoscaling features by sending a heavy load of requests.
One option for doing this is using the [perf_analyzer](../../docs/perf_analyzer.md) application.
One option for doing this is using the [perf_analyzer](../../docs/user_guide/perf_analyzer.md) application.

You can apply a progressively increasing load with a command like:
```
Expand Down
6 changes: 3 additions & 3 deletions deploy/mlflow-triton-plugin/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ OUTPUT1 is the element-wise subtraction of INPUT0 and INPUT1.
### Start Triton Inference Server in EXPLICIT mode

The MLflow Triton plugin must work with a running Triton server, see
[documentation](https://github.com/triton-inference-server/server/blob/main/docs/quickstart.md)
[documentation](https://github.com/triton-inference-server/server/blob/main/docs/getting_started/quickstart.md)
of Triton Inference Server for how to start the server. Note that
the server should be run in EXPLICIT mode (`--model-control-mode=explicit`)
to exploit the deployment feature of the plugin.
Expand All @@ -83,7 +83,7 @@ can interact with the server properly:
The MLFlow ONNX built-in functionalities can be used to publish `onnx` flavor
models to MLFlow directly, and the MLFlow Triton plugin will prepare the model
to the format expected by Triton. You may also log
[`config.pbtxt`](](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_model_configuration.md))
[`config.pbtxt`](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_model_configuration.md)
as additonal artifact which Triton will be used to serve the model. Otherwise,
the server should be run with auto-complete feature enabled
(`--strict-model-config=false`) to generate the model configuration.
Expand All @@ -101,7 +101,7 @@ For other model frameworks that Triton supports but not yet recognized by
the MLFlow Triton plugin, the `publish_model_to_mlflow.py` script can be used to
publish `triton` flavor models to MLflow. A `triton` flavor model is a directory
containing the model files following the
[model layout](https://github.com/triton-inference-server/server/blob/main/docs/model_repository.md#repository-layout).
[model layout](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_repository.md#repository-layout).
Below is an example usage:

```
Expand Down
4 changes: 2 additions & 2 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ Where \<yy.mm\> is the version of Triton that you want to pull. For a complete l
This guide covers the simplest possible workflow for deploying a model using a Triton Inference Server.
- [Create a Model Repository](getting_started/quickstart.md#create-a-model-repository)
- [Launch Triton](getting_started/quickstart.md#launch-triton)
- [Send an Inference Request](getting_started/quickstart.md#sending-an-inference-request)
- [Send an Inference Request](getting_started/quickstart.md#send-an-inference-request)

Triton Inference Server has a considerable list versatile and powerful features. All new users are recommended to explore the [User Guide](README.md#user-guide) and the [additional resources](README.md#resources) sections for features most relevant to their use case.

Expand Down Expand Up @@ -122,7 +122,7 @@ Triton supports batching individual inference requests to improve compute resour
- [Stateful Models](user_guide/architecture.md#stateful-models)
- [Control Inputs](user_guide/architecture.md#control-inputs)
- [Implicit State - Stateful Inference Using a Stateless Model](user_guide/architecture.md#implicit-state-management)
- [Sequence Scheduling Strategies](user_guide/architecture.md#scheduling-strateties)
- [Sequence Scheduling Strategies](user_guide/architecture.md#scheduling-strategies)
- [Direct](user_guide/architecture.md#direct)
- [Oldest](user_guide/architecture.md#oldest)

Expand Down
4 changes: 2 additions & 2 deletions docs/examples/jetson/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,12 +51,12 @@ Inference Server as a shared library.

## Part 2. Analyzing model performance with perf_analyzer

To analyze model performance on Jetson, [perf_analyzer](https://github.com/triton-inference-server/server/blob/main/docs/perf_analyzer.md) tool is used. The `perf_analyzer` is included in the release tar file or can be compiled from source.
To analyze model performance on Jetson, [perf_analyzer](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/perf_analyzer.md) tool is used. The `perf_analyzer` is included in the release tar file or can be compiled from source.

From this directory of the repository, execute the following to evaluate model performance:

```shell
./perf_analyzer -m peoplenet -b 2 --service-kind=triton_c_api --model-repo=$(pwd)/concurrency_and_dynamic_batching/trtis_model_repo_sample_1 --triton-server-directory=/opt/tritonserver --concurrency-range 1:6 -f perf_c_api.csv
```

In the example above we saved the results as a `.csv` file. To visualize these results, follow the steps described [here](https://github.com/triton-inference-server/server/blob/main/docs/perf_analyzer.md#visualizing-latency-vs-throughput).
In the example above we saved the results as a `.csv` file. To visualize these results, follow the steps described [here](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/perf_analyzer.md#visualizing-latency-vs-throughput).
Original file line number Diff line number Diff line change
Expand Up @@ -326,6 +326,6 @@ dynamic_batching {
}
```

To try further options of dynamic batcher see the [documentation](https://github.com/triton-inference-server/server/blob/main/docs/model_configuration.md#dynamic-batcher).
To try further options of dynamic batcher see the [documentation](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#dynamic-batcher).

You can also try enabling both concurrent model execution and dynamic batching.
2 changes: 1 addition & 1 deletion docs/getting_started/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ $ curl -v localhost:8000/v2/health/ready
The HTTP request returns status 200 if Triton is ready and non-200 if
it is not ready.

## Send an Infernce Request
## Send an Inference Request

Use docker pull to get the client libraries and examples image
from NGC.
Expand Down
5 changes: 3 additions & 2 deletions docs/user_guide/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -311,8 +311,9 @@ description of the model contains variable-sized dimensions, Triton will use *1*
for every variable-sized dimension for the starting request. For other
non-starting requests in the sequence, the input state is the output state of
the previous request in the sequence. For an example ONNX model that uses
implicit state you can refer to
[this ONNX model](../../qa/common/gen_qa_implicit_models.py#L101).
implicit state you can refer to this onnx model generated from the
`create_onnx_modelfile_wo_initial_state()`
[from this generation script](../../qa/common/gen_qa_implicit_models.py).
This is a simple accumulator model that stores the partial sum of the requests
in a sequence in Triton using implicit state. For state initialization, if the
request is starting, the model sets the "OUTPUT\_STATE" to be equal to the
Expand Down
8 changes: 4 additions & 4 deletions docs/user_guide/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ same as when using the model's framework directly. However, with
Triton you get benefits like [concurrent model
execution](architecture.md#concurrent-model-execution) (the ability to
run multiple models at the same time on the same GPU) and [dynamic
batching](architecture.md#dynamic-batcher) to get better
batching](model_configuration.md#dynamic-batcher) to get better
throughput. You can also [replace or upgrade models while Triton and
client application are running](model_management.md). Another benefit
is that Triton can be deployed as a Docker container, anywhere – on
Expand Down Expand Up @@ -84,7 +84,7 @@ library to suit your specific needs.

In an AWS environment, the Triton Inference Server docker container
can run on [CPU-only instances or GPU compute
instances](../getting_started/quickstart.md#run-triton). Triton can run directly on the
instances](../getting_started/quickstart.md#launch-triton). Triton can run directly on the
compute instance or inside Elastic Kubernetes Service (EKS). In
addition, other AWS services such as Elastic Load Balancer (ELB) can
be used for load balancing traffic among multiple Triton
Expand Down Expand Up @@ -121,13 +121,13 @@ concurrency](model_configuration.md#instance-groups) on a
model-by-model basis.

* Triton can [batch together multiple inference requests into a single
inference execution](architecture.md#dynamic-batcher). Typically,
inference execution](model_configuration.md#dynamic-batcher). Typically,
batching inference requests leads to much higher thoughput with only
a relatively small increase in latency.

As a general rule, batching is the most beneficial way to increase GPU
utilization. So you should always try enabling the [dynamic
batcher](architecture.md#dynamic-batcher) with your models. Using
batcher](model_configuration.md#dynamic-batcher) with your models. Using
multiple instances of a model can also provide some benefit but is
typically most useful for models that have small compute
requirements. Most models will benefit from using two instances but
Expand Down
2 changes: 1 addition & 1 deletion docs/user_guide/model_configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -867,7 +867,7 @@ maximum batch size allowed by the model (but see the following section
for the delay option that changes this behavior).

The size of generated batches can be examined in aggregate using
[count metrics](metrics.md#count-metrics).
[count metrics](metrics.md#inference-request-metrics).

#### Delayed Batching

Expand Down
2 changes: 1 addition & 1 deletion docs/user_guide/optimization.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ latency.

For most models, the Triton feature that provides the largest
performance improvement is [dynamic
batching](architecture.md#dynamic-batcher). If your model does not
batching](model_configuration.md#dynamic-batcher). If your model does not
support batching then you can skip ahead to [Model
Instances](#model-instances).

Expand Down
20 changes: 20 additions & 0 deletions qa/L0_doc_links/mkdocs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Copyright (c) 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

site_name: CI Test
use_directory_urls: False
docs_dir: "./repos"
plugins:
- htmlproofer
- search
51 changes: 51 additions & 0 deletions qa/L0_doc_links/test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Copyright (c) 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

LOG="`pwd`/log.txt"
CONFIG="`pwd`/mkdocs.yml"
RET=0
# Download necessary packages
python3 -m pip install mkdocs
python3 -m pip install mkdocs-htmlproofer-plugin

# Get the necessary repos
mkdir repos && cd repos
TRITON_BACKEND_REPO_TAG=${TRITON_BACKEND_REPO_TAG:="main"}
echo ${TRITON_BACKEND_REPO_TAG}
git clone --single-branch --depth=1 -b ${TRITON_BACKEND_REPO_TAG} https://github.com/triton-inference-server/backend.git
cd ..

exec mkdocs serve -f $CONFIG > $LOG &
PID=$!
# Time for the compilation to finish. This needs to be increased if other repos
# are added to the test
sleep 20

until [[ (-z `pgrep mkdocs`) ]]; do
kill -2 $PID
sleep 2
done

if [[ ! -z `grep "invalid url" $LOG` ]]; then
cat $LOG
RET=1
fi


if [ $RET -eq 0 ]; then
echo -e "\n***\n*** Test PASSED\n***"
else
echo -e "\n***\n*** Test FAILED\n***"
fi
# exit $RET