Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote and Docker mode changes #758

Merged
merged 1 commit into from
Aug 31, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Changes to remove remote mode and abs path restrictions
  • Loading branch information
nv-braf committed Aug 31, 2023
commit dffbfe5dcd34f0b95da2134bd14ea9e6fb0be620
8 changes: 2 additions & 6 deletions docs/bls_quick_start.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,14 +59,10 @@ docker run -it --gpus 1 \
--shm-size 2G \
-v /var/run/docker.sock:/var/run/docker.sock \
-v $(pwd)/examples/quick-start:$(pwd)/examples/quick-start \
-v <path-to-output-model-repo>:<path-to-output-model-repo> \
--net=host nvcr.io/nvidia/tritonserver:23.09-py3-sdk
```

**Replacing** `<path-to-output-model-repo>` with the
**_absolute_ _path_** to the directory where the output model repository will be located. This ensures the Triton SDK container has access to the model config variants that Model Analyzer creates.<br><br>
**Important:** You must ensure the absolutes paths are identical on both sides of the mounts (or else Tritonserver cannot load the model)<br><br>
**Important:** The example above uses a single GPU. If you are running on multiple GPUs, you need to increase the shared memory size accordingly<br><br>
**Important:** The example above uses a single GPU. If you are running on multiple GPUs, you may need to increase the shared memory size accordingly<br><br>

## `Step 3:` Profile the `bls` model

Expand Down Expand Up @@ -168,4 +164,4 @@ $HOME
`-- metrics-server-only.csv
```

**Note:** Above configurations, bls_config_7, bls_config_8, and bls_config_9 are generated as the top configurations when running profiling on a single Tesla V100 GPU. However, running on multiple GPUs or different model GPUs may result in different top configurations.
**Note:** Above configurations, bls_config_7, bls_config_8, and bls_config_9 are generated as the top configurations when running profiling on a single Tesla V100 GPU. However, running on multiple GPUs or different model GPUs may result in different top configurations.
5 changes: 0 additions & 5 deletions docs/ensemble_quick_start.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,13 +65,9 @@ docker run -it --gpus 1 \
--shm-size 1G \
-v /var/run/docker.sock:/var/run/docker.sock \
-v $(pwd)/examples/quick-start:$(pwd)/examples/quick-start \
-v <path-to-output-model-repo>:<path-to-output-model-repo> \
--net=host nvcr.io/nvidia/tritonserver:23.09-py3-sdk
```

**Replacing** `<path-to-output-model-repo>` with the
**_absolute_ _path_** to the directory where the output model repository will be located. This ensures the Triton SDK container has access to the model config variants that Model Analyzer creates.<br><br>
**Important:** You must ensure the absolutes paths are identical on both sides of the mounts (or else Tritonserver cannot load the model)<br><br>
**Important:** The example above uses a single GPU. If you are running on multiple GPUs, you may need to increase the shared memory size accordingly<br><br>

## `Step 3:` Profile the `ensemble_add_sub` model
Expand Down Expand Up @@ -101,7 +97,6 @@ model-analyzer profile \

The Model analyzer uses [Quick Search](config_search.md#quick-search-mode) algorithm for profiling the Ensemble model. After the quick search is completed, Model Analyzer will then sweep concurrencies for the top three configurations and then create a summary report and CSV outputs.


Here is an example result summary, run on a Tesla V100 GPU:

![Result Summary Top](../examples/ensemble_result_summary_top.jpg)
Expand Down
29 changes: 10 additions & 19 deletions docs/launch_modes.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

# Launch Modes

Triton Model Analyzer's `profile` subcommand supports four different launch
Expand All @@ -25,7 +26,7 @@ Inference Server.
### Docker

| CLI Option | **`--triton-launch-mode=docker`** |
| - | - |
| ---------- | --------------------------------- |

Note: A full step by step example of docker mode can be found in the [Quick Start Guide](quick_start.md).

Expand All @@ -40,16 +41,8 @@ following flags are mandatory for correct behavior:

Additionally, Model Analyzer uses the `output_model_repository_path` to
manipulate and store model config variants. When Model Analyzer launches the
Triton container, it does so as a *sibling container*. The launched Triton
container will only have access to the host filesystem. **As a result, in the
docker launch mode, the output model directory will need to be mounted to the
Model Analyzer docker container at the same absolute path it has outside the
container.** So you must add the following when you launch the model analyzer
container as well.

```
-v <path-to-output-model-repository>:<path-to-output-model-repository>
```
Triton container, it does so as a _sibling container_. The launched Triton
container will only have access to the host filesystem.

Finally, when launching model analyzer, the argument `--output-model-repository`
must be provided as a directory inside `<path-to-output-model-repository>`. This
Expand All @@ -65,7 +58,7 @@ Triton SDK Container. You will need Docker installed, though.
### Local

| CLI Option | **`--triton-launch-mode=local`** |
| - | - |
| ---------- | -------------------------------- |

Local mode is the default mode if no `triton-launch-mode` is specified.

Expand All @@ -80,7 +73,7 @@ have a TritonServer executable
### C API

| CLI Option | **`--triton-launch-mode=c_api`** |
| - | - |
| ---------- | -------------------------------- |

In this mode, Triton server is launched locally via the
[C_API](https://github.com/triton-inference-server/server/blob/main/docs/customization_guide/inference_protocols.md#in-process-triton-server-api)
Expand All @@ -96,19 +89,17 @@ the Model Analyzer is being used.
The server metrics that Model Analyzer gathers and reports are not available directly
from the triton server when running in C-API mode. Instead, Model Analyzer will attempt to
gather this information itself. This can lead to less precise results, and will generally result
in GPU utilization and power numbers being underreported.
in GPU utilization and power numbers being under-reported.

### Remote

| CLI Option | **`--triton-launch-mode=remote`** |
| - | - |
| ---------- | --------------------------------- |

This mode is beneficial when you want to use an already running Triton Inference
Server. You may provide the URLs for the Triton instance's HTTP or GRPC endpoint
depending on your chosen client protocol using the `--triton-grpc-endpoint`, and
`--triton-http-endpoint` flags. You should also make sure that same GPUs are
available to the Inference Server and Model Analyzer and they are on the same
machine. Model Analyzer does not currently support profiling remote GPUs. Triton
Server in this mode needs to be launched with `--model-control-mode explicit`
flag to support loading/unloading of the models. The model parameters cannot be
changed in remote mode, though.
machine. Triton Server in this mode needs to be launched with `--model-control-mode explicit`
flag to support loading/unloading of the models.
8 changes: 0 additions & 8 deletions docs/mm_quick_start.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,17 +58,9 @@ docker pull nvcr.io/nvidia/tritonserver:23.08-py3-sdk
docker run -it --gpus all \
-v /var/run/docker.sock:/var/run/docker.sock \
-v $(pwd)/examples/quick-start:$(pwd)/examples/quick-start \
-v <path-to-output-model-repo>:<path-to-output-model-repo> \
--net=host nvcr.io/nvidia/tritonserver:23.08-py3-sdk
```

**Replacing** `<path-to-output-model-repo>` with the
**_absolute_ _path_** to the directory where the output model repository
will be located.
This ensures the Triton SDK container has access to the model
config variants that Model Analyzer creates.<br><br>
**Important:** You must ensure the absolutes paths are identical on both sides of the mounts (or else Tritonserver cannot load the model)<br><br>

## `Step 3:` Profile both models concurrently

---
Expand Down
8 changes: 0 additions & 8 deletions docs/quick_start.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,17 +58,9 @@ docker pull nvcr.io/nvidia/tritonserver:23.08-py3-sdk
docker run -it --gpus all \
-v /var/run/docker.sock:/var/run/docker.sock \
-v $(pwd)/examples/quick-start:$(pwd)/examples/quick-start \
-v <path-to-output-model-repo>:<path-to-output-model-repo> \
--net=host nvcr.io/nvidia/tritonserver:23.08-py3-sdk
```

**Replacing** `<path-to-output-model-repo>` with the
**_absolute_ _path_** to the directory where the output model repository
will be located.
This ensures the Triton SDK container has access to the model
config variants that Model Analyzer creates.<br><br>
**Important:** You must ensure the absolutes paths are identical on both sides of the mounts (or else Tritonserver cannot load the model)<br><br>

## `Step 3:` Profile the `add_sub` model

---
Expand Down