Skip to content

Commit

Permalink
Update docs and examples to use Triton 23.06 (nv-morpheus#1037)
Browse files Browse the repository at this point in the history
- Update docs and examples to use Triton 23.06
- Most examples are using 22.08. Significant speedup is seen in ONNX->TRT conversion that happens in first inference run of the pipeline (e.g. SID).

Authors:
  - Eli Fajardo (https://github.com/efajardo-nv)

Approvers:
  - Michael Demoret (https://github.com/mdemoret-nv)

URL: nv-morpheus#1037
  • Loading branch information
efajardo-nv authored Jul 12, 2023
1 parent de563b6 commit 4ef44cf
Show file tree
Hide file tree
Showing 14 changed files with 23 additions and 23 deletions.
2 changes: 1 addition & 1 deletion .devcontainer/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ services:
triton:
container_name: morpheus-triton
runtime: nvidia
image: nvcr.io/nvidia/tritonserver:22.10-py3
image: nvcr.io/nvidia/tritonserver:23.06-py3
command: tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false ${TRITON_MODEL_ARGS}
ports:
- 8000:8000
Expand Down
2 changes: 1 addition & 1 deletion docs/source/basics/building_a_pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,7 @@ This example shows an NLP Pipeline which uses several stages available in Morphe
#### Launching Triton
From the Morpheus repo root directory, run the following to launch Triton and load the `sid-minibert` model:
```bash
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:22.08-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model sid-minibert-onnx
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:23.06-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model sid-minibert-onnx
```

#### Launching Kafka
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,7 @@ From the root of the Morpheus project we will launch a Triton Docker container w
```shell
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 \
-v $PWD/models:/models \
nvcr.io/nvidia/tritonserver:22.08-py3 \
nvcr.io/nvidia/tritonserver:23.06-py3 \
tritonserver --model-repository=/models/triton-model-repo \
--exit-on-error=false \
--log-info=true \
Expand Down
6 changes: 3 additions & 3 deletions docs/source/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ More advanced users, or those who are interested in using the latest pre-release
- NVIDIA driver `450.80.02` or higher
- [Docker](https://docs.docker.com/get-docker/)
- [The NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker)
- [NVIDIA Triton Inference Server](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver) `22.06` or higher
- [NVIDIA Triton Inference Server](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver) `23.06` or higher

> **Note about Docker:**
>
Expand Down Expand Up @@ -146,7 +146,7 @@ Many of the validation tests and example workflows require a Triton server to fu
```bash
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 \
-v $PWD/models:/models \
nvcr.io/nvidia/tritonserver:22.08-py3 \
nvcr.io/nvidia/tritonserver:23.06-py3 \
tritonserver --model-repository=/models/triton-model-repo \
--exit-on-error=false \
--log-info=true \
Expand All @@ -160,7 +160,7 @@ Note: The above command is useful for testing out Morpheus, however it does load
```bash
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 \
-v $PWD/models:/models \
nvcr.io/nvidia/tritonserver:22.08-py3 \
nvcr.io/nvidia/tritonserver:23.06-py3 \
tritonserver --model-repository=/models/triton-model-repo \
--exit-on-error=false \
--log-info=true \
Expand Down
4 changes: 2 additions & 2 deletions examples/abp_nvsmi_detection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,12 +65,12 @@ This example utilizes the Triton Inference Server to perform inference.

Pull the Docker image for Triton:
```bash
docker pull nvcr.io/nvidia/tritonserver:22.08-py3
docker pull nvcr.io/nvidia/tritonserver:23.06-py3
```

From the Morpheus repo root directory, run the following to launch Triton and load the `abp-nvsmi-xgb` XGBoost model:
```bash
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:22.08-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model abp-nvsmi-xgb
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:23.06-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model abp-nvsmi-xgb
```

This will launch Triton and only load the `abp-nvsmi-xgb` model. This model has been configured with a max batch size of 32768, and to use dynamic batching for increased performance.
Expand Down
4 changes: 2 additions & 2 deletions examples/abp_pcap_detection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ To run this example, an instance of Triton Inference Server and a sample dataset

### Triton Inference Server
```bash
docker pull nvcr.io/nvidia/tritonserver:22.08-py3
docker pull nvcr.io/nvidia/tritonserver:23.06-py3
```

##### Deploy Triton Inference Server
Expand All @@ -35,7 +35,7 @@ From the root of the Morpheus repo, navigate to the anomalous behavior profiling
cd examples/abp_pcap_detection

# Launch the container
docker run --rm --gpus=all -p 8000:8000 -p 8001:8001 -p 8002:8002 -v $PWD/abp-pcap-xgb:/models/abp-pcap-xgb --name tritonserver nvcr.io/nvidia/tritonserver:22.08-py3 tritonserver --model-repository=/models --exit-on-error=false
docker run --rm --gpus=all -p 8000:8000 -p 8001:8001 -p 8002:8002 -v $PWD/abp-pcap-xgb:/models/abp-pcap-xgb --name tritonserver nvcr.io/nvidia/tritonserver:23.06-py3 tritonserver --model-repository=/models --exit-on-error=false
```

##### Verify Model Deployment
Expand Down
4 changes: 2 additions & 2 deletions examples/log_parsing/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Pull Docker image from NGC (https://ngc.nvidia.com/catalog/containers/nvidia:tri
Example:

```bash
docker pull nvcr.io/nvidia/tritonserver:22.08-py3
docker pull nvcr.io/nvidia/tritonserver:23.06-py3
```

##### Setup Env Variable
Expand All @@ -38,7 +38,7 @@ export MORPHEUS_ROOT=$(pwd)
From the Morpheus repo root directory, run the following to launch Triton and load the `log-parsing-onnx` model:

```bash
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:22.08-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model log-parsing-onnx
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:23.06-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model log-parsing-onnx
```

##### Verify Model Deployment
Expand Down
4 changes: 2 additions & 2 deletions examples/nlp_si_detection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,10 +77,10 @@ This example utilizes the Triton Inference Server to perform inference. The neur
From the Morpheus repo root directory, run the following to launch Triton and load the `sid-minibert` model:

```bash
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:22.08-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model sid-minibert-onnx
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:23.06-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model sid-minibert-onnx
```

Where `22.02-py3` can be replaced with the current year and month of the Triton version to use. For example, to use May 2021, specify `nvcr.io/nvidia/tritonserver:21.05-py3`. Ensure that the version of TensorRT that is used in Triton matches the version of TensorRT elsewhere (refer to [NGC Deep Learning Frameworks Support Matrix](https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html)).
Where `23.06-py3` can be replaced with the current year and month of the Triton version to use. For example, to use May 2021, specify `nvcr.io/nvidia/tritonserver:21.05-py3`. Ensure that the version of TensorRT that is used in Triton matches the version of TensorRT elsewhere (refer to [NGC Deep Learning Frameworks Support Matrix](https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html)).

This will launch Triton and only load the `sid-minibert-onnx` model. This model has been configured with a max batch size of 32, and to use dynamic batching for increased performance.

Expand Down
4 changes: 2 additions & 2 deletions examples/ransomware_detection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Pull Docker image from NGC (https://ngc.nvidia.com/catalog/containers/nvidia:tri
Example:

```bash
docker pull nvcr.io/nvidia/tritonserver:22.08-py3
docker pull nvcr.io/nvidia/tritonserver:23.06-py3
```
##### Setup Env Variable
```bash
Expand All @@ -39,7 +39,7 @@ Run the following from the `examples/ransomware_detection` directory to launch T

```bash
# Run Triton in explicit mode
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models/triton-model-repo nvcr.io/nvidia/tritonserver:22.08-py3 \
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models/triton-model-repo nvcr.io/nvidia/tritonserver:23.06-py3 \
tritonserver --model-repository=/models/triton-model-repo \
--exit-on-error=false \
--model-control-mode=explicit \
Expand Down
4 changes: 2 additions & 2 deletions examples/root_cause_analysis/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,10 +46,10 @@ This example utilizes the Triton Inference Server to perform inference. The bina
From the Morpheus repo root directory, run the following to launch Triton and load the `root-cause-binary-onnx` model:

```bash
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:22.08-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model root-cause-binary-onnx
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:23.06-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model root-cause-binary-onnx
```

Where `22.08-py3` can be replaced with the current year and month of the Triton version to use. For example, to use May 2021, specify `nvcr.io/nvidia/tritonserver:21.05-py3`. Ensure that the version of TensorRT that is used in Triton matches the version of TensorRT elsewhere (refer to [NGC Deep Learning Frameworks Support Matrix](https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html)).
Where `23.06-py3` can be replaced with the current year and month of the Triton version to use. For example, to use May 2021, specify `nvcr.io/nvidia/tritonserver:21.05-py3`. Ensure that the version of TensorRT that is used in Triton matches the version of TensorRT elsewhere (refer to [NGC Deep Learning Frameworks Support Matrix](https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html)).

This will launch Triton and only load the model required by our example pipeline. The model has been configured with a max batch size of 32, and to use dynamic batching for increased performance.

Expand Down
2 changes: 1 addition & 1 deletion examples/sid_visualization/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ x-with-gpus: &with_gpus

services:
triton:
image: nvcr.io/nvidia/tritonserver:22.08-py3
image: nvcr.io/nvidia/tritonserver:23.06-py3
<<: *with_gpus
command: "tritonserver --exit-on-error=false --model-control-mode=explicit --load-model sid-minibert-onnx --model-repository=/models/triton-model-repo"
environment:
Expand Down
2 changes: 1 addition & 1 deletion scripts/validation/val-globals.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ export e="\033[0;90m"
export y="\033[0;33m"
export x="\033[0m"

export TRITON_IMAGE=${TRITON_IMAGE:-"nvcr.io/nvidia/tritonserver:22.08-py3"}
export TRITON_IMAGE=${TRITON_IMAGE:-"nvcr.io/nvidia/tritonserver:23.06-py3"}

# TRITON_GRPC_PORT is only used when TRITON_URL is undefined
export TRITON_GRPC_PORT=${TRITON_GRPC_PORT:-"8001"}
Expand Down
2 changes: 1 addition & 1 deletion scripts/validation/val-utils.sh
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ function wait_for_triton {

function ensure_triton_running {

TRITON_IMAGE=${TRITON_IMAGE:-"nvcr.io/nvidia/tritonserver:22.08-py3"}
TRITON_IMAGE=${TRITON_IMAGE:-"nvcr.io/nvidia/tritonserver:23.06-py3"}

IS_RUNNING=$(is_triton_running)

Expand Down
4 changes: 2 additions & 2 deletions tests/benchmarks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,14 +24,14 @@ Pull Docker image from NGC (https://ngc.nvidia.com/catalog/containers/nvidia:tri
Example:

```
docker pull nvcr.io/nvidia/tritonserver:23.03-py3
docker pull nvcr.io/nvidia/tritonserver:23.06-py3
```

##### Start Triton Inference Server container
```
cd ${MORPHEUS_ROOT}/models
docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD:/models nvcr.io/nvidia/tritonserver:23.03-py3 tritonserver --model-repository=/models/triton-model-repo --model-control-mode=explicit --load-model sid-minibert-onnx --load-model abp-nvsmi-xgb --load-model phishing-bert-onnx
docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD:/models nvcr.io/nvidia/tritonserver:23.06-py3 tritonserver --model-repository=/models/triton-model-repo --model-control-mode=explicit --load-model sid-minibert-onnx --load-model abp-nvsmi-xgb --load-model phishing-bert-onnx
```

##### Verify Model Deployments
Expand Down

0 comments on commit 4ef44cf

Please sign in to comment.