forked from triton-inference-server/model_analyzer
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add Quick start guides for Ensemble and BLS (triton-inference-server#755
) * Add Ensemble model and BLS model quick start guide * Update ensemble_quick_start.md * Add bls_quick_start.md * Update newly added quick start guides * Update BLS and Ensemble quick start guides * new line * new line * Pre commit error fixes * Pre-commit errors fix * Modifications
- Loading branch information
Showing
8 changed files
with
377 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,171 @@ | ||
<!-- | ||
Copyright (c) 2020-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
Licensed under the Apache License, Version 2.0 (the "License"); | ||
you may not use this file except in compliance with the License. | ||
You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. | ||
--> | ||
|
||
# BLS Model Quick Start | ||
|
||
The steps below will guide you through using Model Analyzer in Docker mode to profile and analyze a simple BLS model: bls. | ||
|
||
## `Step 1:` Download the BLS model `bls` and composing model `add` | ||
|
||
--- | ||
|
||
**1. Create a new directory and enter it** | ||
|
||
``` | ||
mkdir <new_dir> && cd <new_dir> | ||
``` | ||
|
||
**2. Start a git repository** | ||
|
||
``` | ||
git init && git remote add -f origin https://github.com/triton-inference-server/model_analyzer.git | ||
``` | ||
|
||
**3. Enable sparse checkout, and download the examples directory, which contains the bls and add models** | ||
|
||
``` | ||
git config core.sparseCheckout true && \ | ||
echo 'examples' >> .git/info/sparse-checkout && \ | ||
git pull origin main | ||
``` | ||
|
||
## `Step 2:` Pull and Run the SDK Container | ||
|
||
--- | ||
|
||
**1. Pull the SDK container:** | ||
|
||
``` | ||
docker pull nvcr.io/nvidia/tritonserver:23.09-py3-sdk | ||
``` | ||
|
||
**2. Run the SDK container** | ||
|
||
``` | ||
docker run -it --gpus 1 \ | ||
--shm-size 2G \ | ||
-v /var/run/docker.sock:/var/run/docker.sock \ | ||
-v $(pwd)/examples/quick-start:$(pwd)/examples/quick-start \ | ||
-v <path-to-output-model-repo>:<path-to-output-model-repo> \ | ||
--net=host nvcr.io/nvidia/tritonserver:23.09-py3-sdk | ||
``` | ||
|
||
**Replacing** `<path-to-output-model-repo>` with the | ||
**_absolute_ _path_** to the directory where the output model repository will be located. This ensures the Triton SDK container has access to the model config variants that Model Analyzer creates.<br><br> | ||
**Important:** You must ensure the absolutes paths are identical on both sides of the mounts (or else Tritonserver cannot load the model)<br><br> | ||
**Important:** The example above uses a single GPU. If you are running on multiple GPUs, you need to increase the shared memory size accordingly<br><br> | ||
|
||
## `Step 3:` Profile the `bls` model | ||
|
||
--- | ||
|
||
The [examples/quick-start](../examples/quick-start) directory is an example [Triton Model Repository](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_repository.md) that contains the BLS model `bls` which calculates the sum of two inputs using `add` model. | ||
|
||
An example model analyzer YAML config that performs a BLS model search | ||
|
||
``` | ||
model_repository: <path-to-examples-quick-start> | ||
profile_models: | ||
- bls | ||
bls_composing_models: add | ||
perf_analyzer_flags: | ||
input-data: <path-to-examples-bls_input_data.json> | ||
triton_launch_mode: docker | ||
triton_docker_shm_size: 2G | ||
output_model_repository_path: <path-to-output-model-repo>/<output_dir> | ||
export_path: profile_results | ||
``` | ||
|
||
**Important:** You must specify an `<output_dir>` subdirectory. You cannot have `output_model_repository_path` point directly to `<path-to-output-model-repo>` | ||
|
||
**Important:** If you already ran this earlier in the container, you can overwrite earlier results by adding the `override_output_model_repository: true` field to the YAML file. | ||
|
||
**Important**: All models must be in the same repository | ||
|
||
**Important:** [`bls`](../examples/quick-start/bls) model takes "MODEL_NAME" as one of its inputs. We must include "add" in the input data JSON file as "MODEL_NAME" for this example to function. Otherwise, Perf Analyzer will produce random data for "MODEL_NAME," resulting in failed inferences. | ||
|
||
Run the Model Analyzer `profile` subcommand inside the container with: | ||
|
||
``` | ||
model-analyzer profile -f /path/to/config.yml | ||
``` | ||
|
||
--- | ||
|
||
The Model analyzer uses [Quick Search](config_search.md#quick-search-mode) algorithm for profiling the BLS model. After the quick search is completed, Model Analyzer will then sweep concurrencies for the top three configurations and then create a summary report and CSV outputs. | ||
|
||
Here is an example result summary, run on a Tesla V100 GPU: | ||
|
||
![Result Summary Top](../examples/bls_result_summary_top.jpg) | ||
![Result Summary Table](../examples/bls_result_summary_table.jpg) | ||
|
||
You will note that the top model configuration has a higher throughput than the other configurations. | ||
|
||
--- | ||
|
||
The measured data and summary report will be placed inside the | ||
`./profile_results` directory. The directory will be structured as follows. | ||
|
||
``` | ||
$HOME | ||
|-- model_analyzer | ||
|-- profile_results | ||
|-- perf_analyzer_error.log | ||
|-- plots | ||
| |-- detailed | ||
| | |-- bls_config_7 | ||
| | | `-- latency_breakdown.png | ||
| | |-- bls_config_8 | ||
| | | `-- latency_breakdown.png | ||
| | `-- bls_config_9 | ||
| | `-- latency_breakdown.png | ||
| `-- simple | ||
| |-- bls | ||
| | |-- gpu_mem_v_latency.png | ||
| | `-- throughput_v_latency.png | ||
| |-- bls_config_7 | ||
| | |-- cpu_mem_v_latency.png | ||
| | |-- gpu_mem_v_latency.png | ||
| | |-- gpu_power_v_latency.png | ||
| | `-- gpu_util_v_latency.png | ||
| |-- bls_config_8 | ||
| | |-- cpu_mem_v_latency.png | ||
| | |-- gpu_mem_v_latency.png | ||
| | |-- gpu_power_v_latency.png | ||
| | `-- gpu_util_v_latency.png | ||
| `-- bls_config_9 | ||
| |-- cpu_mem_v_latency.png | ||
| |-- gpu_mem_v_latency.png | ||
| |-- gpu_power_v_latency.png | ||
| `-- gpu_util_v_latency.png | ||
|-- reports | ||
| |-- detailed | ||
| | |-- bls_config_7 | ||
| | | `-- detailed_report.pdf | ||
| | |-- bls_config_8 | ||
| | | `-- detailed_report.pdf | ||
| | `-- bls_config_9 | ||
| | `-- detailed_report.pdf | ||
| `-- summaries | ||
| `-- bls | ||
| `-- result_summary.pdf | ||
`-- results | ||
|-- metrics-model-gpu.csv | ||
|-- metrics-model-inference.csv | ||
`-- metrics-server-only.csv | ||
``` | ||
|
||
**Note:** Above configurations, bls_config_7, bls_config_8, and bls_config_9 are generated as the top configurations when running profiling on a single Tesla V100 GPU. However, running on multiple GPUs or different model GPUs may result in different top configurations. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,165 @@ | ||
<!-- | ||
Copyright (c) 2020-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
Licensed under the Apache License, Version 2.0 (the "License"); | ||
you may not use this file except in compliance with the License. | ||
You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. | ||
--> | ||
|
||
# Ensemble Model Quick Start | ||
|
||
The steps below will guide you through using Model Analyzer in Docker mode to profile and analyze a simple ensemble model: ensemble_add_sub. | ||
|
||
## `Step 1:` Download the ensemble model `ensemble_add_sub` and composing models `add`, `sub` | ||
|
||
--- | ||
|
||
**1. Create a new directory and enter it** | ||
|
||
``` | ||
mkdir <new_dir> && cd <new_dir> | ||
``` | ||
|
||
**2. Start a git repository** | ||
|
||
``` | ||
git init && git remote add -f origin https://github.com/triton-inference-server/model_analyzer.git | ||
``` | ||
|
||
**3. Enable sparse checkout, and download the examples directory, which contains the ensemble_add_sub, add and sub** | ||
|
||
``` | ||
git config core.sparseCheckout true && \ | ||
echo 'examples' >> .git/info/sparse-checkout && \ | ||
git pull origin main | ||
``` | ||
|
||
**3. Add a version directory to ensemble_add_sub** | ||
|
||
``` | ||
mkdir examples/quick/ensemble_add_sub/1 | ||
``` | ||
|
||
## `Step 2:` Pull and Run the SDK Container | ||
|
||
--- | ||
|
||
**1. Pull the SDK container:** | ||
|
||
``` | ||
docker pull nvcr.io/nvidia/tritonserver:23.09-py3-sdk | ||
``` | ||
|
||
**2. Run the SDK container** | ||
|
||
``` | ||
docker run -it --gpus 1 \ | ||
--shm-size 1G \ | ||
-v /var/run/docker.sock:/var/run/docker.sock \ | ||
-v $(pwd)/examples/quick-start:$(pwd)/examples/quick-start \ | ||
-v <path-to-output-model-repo>:<path-to-output-model-repo> \ | ||
--net=host nvcr.io/nvidia/tritonserver:23.09-py3-sdk | ||
``` | ||
|
||
**Replacing** `<path-to-output-model-repo>` with the | ||
**_absolute_ _path_** to the directory where the output model repository will be located. This ensures the Triton SDK container has access to the model config variants that Model Analyzer creates.<br><br> | ||
**Important:** You must ensure the absolutes paths are identical on both sides of the mounts (or else Tritonserver cannot load the model)<br><br> | ||
**Important:** The example above uses a single GPU. If you are running on multiple GPUs, you may need to increase the shared memory size accordingly<br><br> | ||
|
||
## `Step 3:` Profile the `ensemble_add_sub` model | ||
|
||
--- | ||
|
||
The [examples/quick-start](../examples/quick-start) directory is an example [Triton Model Repository](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_repository.md) that contains the ensemble model `ensemble_add_sub`, which calculates the sum and difference of two inputs using `add` and `sub` models. | ||
|
||
Run the Model Analyzer `profile` subcommand inside the container with: | ||
|
||
``` | ||
model-analyzer profile \ | ||
--model-repository <path-to-examples-quick-start> \ | ||
--profile-models ensemble_add_sub \ | ||
--triton-launch-mode=docker --triton-docker-shm-size=1G \ | ||
--output-model-repository-path <path-to-output-model-repo>/<output_dir> \ | ||
--export-path profile_results | ||
``` | ||
|
||
**Important:** You must specify an `<output_dir>` subdirectory. You cannot have `--output-model-repository-path` point directly to `<path-to-output-model-repo>` | ||
|
||
**Important:** If you already ran this earlier in the container, you can use the `--override-output-model-repository` option to overwrite the earlier results. | ||
|
||
**Important**: All models must be in the same repository | ||
|
||
--- | ||
|
||
The Model analyzer uses [Quick Search](config_search.md#quick-search-mode) algorithm for profiling the Ensemble model. After the quick search is completed, Model Analyzer will then sweep concurrencies for the top three configurations and then create a summary report and CSV outputs. | ||
|
||
|
||
Here is an example result summary, run on a Tesla V100 GPU: | ||
|
||
![Result Summary Top](../examples/ensemble_result_summary_top.jpg) | ||
![Result Summary Table](../examples/ensemble_result_summary_table.jpg) | ||
|
||
You will note that the top model configuration has a higher throughput than the other configurations. | ||
|
||
--- | ||
|
||
The measured data and summary report will be placed inside the | ||
`./profile_results` directory. The directory will be structured as follows. | ||
|
||
``` | ||
$HOME | ||
|-- model_analyzer | ||
|-- profile_results | ||
|-- plots | ||
| |-- detailed | ||
| | |-- ensemble_add_sub_config_5 | ||
| | | `-- latency_breakdown.png | ||
| | |-- ensemble_add_sub_config_6 | ||
| | | `-- latency_breakdown.png | ||
| | `-- ensemble_add_sub_config_7 | ||
| | `-- latency_breakdown.png | ||
| `-- simple | ||
| |-- ensemble_add_sub | ||
| | |-- gpu_mem_v_latency.png | ||
| | `-- throughput_v_latency.png | ||
| |-- ensemble_add_sub_config_5 | ||
| | |-- cpu_mem_v_latency.png | ||
| | |-- gpu_mem_v_latency.png | ||
| | |-- gpu_power_v_latency.png | ||
| | `-- gpu_util_v_latency.png | ||
| |-- ensemble_add_sub_config_6 | ||
| | |-- cpu_mem_v_latency.png | ||
| | |-- gpu_mem_v_latency.png | ||
| | |-- gpu_power_v_latency.png | ||
| | `-- gpu_util_v_latency.png | ||
| `-- ensemble_add_sub_config_7 | ||
| |-- cpu_mem_v_latency.png | ||
| |-- gpu_mem_v_latency.png | ||
| |-- gpu_power_v_latency.png | ||
| `-- gpu_util_v_latency.png | ||
|-- reports | ||
| |-- detailed | ||
| | |-- ensemble_add_sub_config_5 | ||
| | | `-- detailed_report.pdf | ||
| | |-- ensemble_add_sub_config_6 | ||
| | | `-- detailed_report.pdf | ||
| | `-- ensemble_add_sub_config_7 | ||
| | `-- detailed_report.pdf | ||
| `-- summaries | ||
| `-- ensemble_add_sub | ||
| `-- result_summary.pdf | ||
`-- results | ||
|-- metrics-model-gpu.csv | ||
|-- metrics-model-inference.csv | ||
`-- metrics-server-only.csv | ||
``` | ||
|
||
**Note:** Above configurations, ensemble_add_sub_config_5, ensemble_add_sub_config_6, and ensemble_add_sub_config_7 are generated as the top configurations when running profiling on a single Tesla V100 GPU. However, running on multiple GPUs or different model GPUs may result in different top configurations. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
{ | ||
"data": [ | ||
{ | ||
"MODEL_NAME": [ | ||
"add" | ||
], | ||
"INPUT0": [ | ||
0.74106514, | ||
0.7371813, | ||
0.5274665, | ||
0.13930903 | ||
], | ||
"INPUT1": [ | ||
0.7845891, | ||
0.88089234, | ||
0.8466405, | ||
0.55024815 | ||
] | ||
} | ||
] | ||
} |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.