Changes to remove remote mode and abs path restrictions

triton-inference-server · nv-braf · Aug 31, 2023 · Aug 31, 2023 · Aug 31, 2023 · dffbfe5dcd34f0b95da2134bd14ea9e6fb0be620
commit dffbfe5dcd34f0b95da2134bd14ea9e6fb0be620
diff --git a/docs/bls_quick_start.md b/docs/bls_quick_start.md
@@ -59,14 +59,10 @@ docker run -it --gpus 1 \
       --shm-size 2G \
       -v /var/run/docker.sock:/var/run/docker.sock \
       -v $(pwd)/examples/quick-start:$(pwd)/examples/quick-start \
-      -v <path-to-output-model-repo>:<path-to-output-model-repo> \
       --net=host nvcr.io/nvidia/tritonserver:23.09-py3-sdk
 ```
 
-**Replacing** `<path-to-output-model-repo>` with the
-**_absolute_ _path_** to the directory where the output model repository will be located. This ensures the Triton SDK container has access to the model config variants that Model Analyzer creates.<br><br>
-**Important:** You must ensure the absolutes paths are identical on both sides of the mounts (or else Tritonserver cannot load the model)<br><br>
-**Important:** The example above uses a single GPU. If you are running on multiple GPUs, you need to increase the shared memory size accordingly<br><br>
+**Important:** The example above uses a single GPU. If you are running on multiple GPUs, you may need to increase the shared memory size accordingly<br><br>
 
 ## `Step 3:` Profile the `bls` model
 
@@ -168,4 +164,4 @@ $HOME
             `-- metrics-server-only.csv
 ```
 
-**Note:** Above configurations, bls_config_7, bls_config_8, and bls_config_9 are generated as the top configurations when running profiling on a single Tesla V100 GPU. However, running on multiple GPUs or different model GPUs may result in different top configurations.
+**Note:** Above configurations, bls_config_7, bls_config_8, and bls_config_9 are generated as the top configurations when running profiling on a single Tesla V100 GPU. However, running on multiple GPUs or different model GPUs may result in different top configurations.
diff --git a/docs/ensemble_quick_start.md b/docs/ensemble_quick_start.md
@@ -65,13 +65,9 @@ docker run -it --gpus 1 \
       --shm-size 1G \
       -v /var/run/docker.sock:/var/run/docker.sock \
       -v $(pwd)/examples/quick-start:$(pwd)/examples/quick-start \
-      -v <path-to-output-model-repo>:<path-to-output-model-repo> \
       --net=host nvcr.io/nvidia/tritonserver:23.09-py3-sdk
 ```
 
-**Replacing** `<path-to-output-model-repo>` with the
-**_absolute_ _path_** to the directory where the output model repository will be located. This ensures the Triton SDK container has access to the model config variants that Model Analyzer creates.<br><br>
-**Important:** You must ensure the absolutes paths are identical on both sides of the mounts (or else Tritonserver cannot load the model)<br><br>
 **Important:** The example above uses a single GPU. If you are running on multiple GPUs, you may need to increase the shared memory size accordingly<br><br>
 
 ## `Step 3:` Profile the `ensemble_add_sub` model
@@ -101,7 +97,6 @@ model-analyzer profile \
 
 The Model analyzer uses [Quick Search](config_search.md#quick-search-mode) algorithm for profiling the Ensemble model. After the quick search is completed, Model Analyzer will then sweep concurrencies for the top three configurations and then create a summary report and CSV outputs.
 
-
 Here is an example result summary, run on a Tesla V100 GPU:
 
 ![Result Summary Top](../examples/ensemble_result_summary_top.jpg)

diff --git a/docs/launch_modes.md b/docs/launch_modes.md
@@ -13,6 +13,7 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
 -->
+
 # Launch Modes
 
 Triton Model Analyzer's `profile` subcommand supports four different launch
@@ -25,7 +26,7 @@ Inference Server.
 ### Docker
 
 | CLI Option | **`--triton-launch-mode=docker`** |
-| - | - |
+| ---------- | --------------------------------- |
 
 Note: A full step by step example of docker mode can be found in the [Quick Start Guide](quick_start.md).
 
@@ -40,16 +41,8 @@ following flags are mandatory for correct behavior:
 
 Additionally, Model Analyzer uses the `output_model_repository_path` to
 manipulate and store model config variants. When Model Analyzer launches the
-Triton container, it does so as a *sibling container*. The launched Triton
-container will only have access to the host filesystem. **As a result, in the
-docker launch mode, the output model directory will need to be mounted to the
-Model Analyzer docker container at the same absolute path it has outside the
-container.** So you must add the following when you launch the model analyzer
-container as well.
-
-```
--v <path-to-output-model-repository>:<path-to-output-model-repository>
-```
+Triton container, it does so as a _sibling container_. The launched Triton
+container will only have access to the host filesystem.
 
 Finally, when launching model analyzer, the argument `--output-model-repository`
 must be provided as a directory inside `<path-to-output-model-repository>`. This
@@ -65,7 +58,7 @@ Triton SDK Container. You will need Docker installed, though.
 ### Local
 
 | CLI Option | **`--triton-launch-mode=local`** |
-| - | - |
+| ---------- | -------------------------------- |
 
 Local mode is the default mode if no `triton-launch-mode` is specified.
 
@@ -80,7 +73,7 @@ have a TritonServer executable
 ### C API
 
 | CLI Option | **`--triton-launch-mode=c_api`** |
-| - | - |
+| ---------- | -------------------------------- |
 
 In this mode, Triton server is launched locally via the
 [C_API](https://github.com/triton-inference-server/server/blob/main/docs/customization_guide/inference_protocols.md#in-process-triton-server-api)
@@ -96,19 +89,17 @@ the Model Analyzer is being used.
 The server metrics that Model Analyzer gathers and reports are not available directly
 from the triton server when running in C-API mode. Instead, Model Analyzer will attempt to
 gather this information itself. This can lead to less precise results, and will generally result
-in GPU utilization and power numbers being underreported.
+in GPU utilization and power numbers being under-reported.
 
 ### Remote
 
 | CLI Option | **`--triton-launch-mode=remote`** |
-| - | - |
+| ---------- | --------------------------------- |
 
 This mode is beneficial when you want to use an already running Triton Inference
 Server. You may provide the URLs for the Triton instance's HTTP or GRPC endpoint
 depending on your chosen client protocol using the `--triton-grpc-endpoint`, and
 `--triton-http-endpoint` flags. You should also make sure that same GPUs are
 available to the Inference Server and Model Analyzer and they are on the same
-machine. Model Analyzer does not currently support profiling remote GPUs. Triton
-Server in this mode needs to be launched with `--model-control-mode explicit`
-flag to support loading/unloading of the models. The model parameters cannot be
-changed in remote mode, though.
+machine. Triton Server in this mode needs to be launched with `--model-control-mode explicit`
+flag to support loading/unloading of the models.
diff --git a/docs/mm_quick_start.md b/docs/mm_quick_start.md
@@ -58,17 +58,9 @@ docker pull nvcr.io/nvidia/tritonserver:23.08-py3-sdk
 docker run -it --gpus all \
       -v /var/run/docker.sock:/var/run/docker.sock \
       -v $(pwd)/examples/quick-start:$(pwd)/examples/quick-start \
-      -v <path-to-output-model-repo>:<path-to-output-model-repo> \
       --net=host nvcr.io/nvidia/tritonserver:23.08-py3-sdk
 ```
 
-**Replacing** `<path-to-output-model-repo>` with the
-**_absolute_ _path_** to the directory where the output model repository
-will be located.
-This ensures the Triton SDK container has access to the model
-config variants that Model Analyzer creates.<br><br>
-**Important:** You must ensure the absolutes paths are identical on both sides of the mounts (or else Tritonserver cannot load the model)<br><br>
-
 ## `Step 3:` Profile both models concurrently
 
 ---

diff --git a/docs/quick_start.md b/docs/quick_start.md
@@ -58,17 +58,9 @@ docker pull nvcr.io/nvidia/tritonserver:23.08-py3-sdk
 docker run -it --gpus all \
       -v /var/run/docker.sock:/var/run/docker.sock \
       -v $(pwd)/examples/quick-start:$(pwd)/examples/quick-start \
-      -v <path-to-output-model-repo>:<path-to-output-model-repo> \
       --net=host nvcr.io/nvidia/tritonserver:23.08-py3-sdk
 ```
 
-**Replacing** `<path-to-output-model-repo>` with the
-**_absolute_ _path_** to the directory where the output model repository
-will be located.
-This ensures the Triton SDK container has access to the model
-config variants that Model Analyzer creates.<br><br>
-**Important:** You must ensure the absolutes paths are identical on both sides of the mounts (or else Tritonserver cannot load the model)<br><br>
-
 ## `Step 3:` Profile the `add_sub` model
 
 ---