Starting with the r20.10 release, two Docker images are available from NVIDIA GPU Cloud (NGC) that make it possible to easily construct customized versions of Triton. By customizing Triton you can significantly reduce the size of the Triton image by removing functionality that you don't require.
Currently the customization is limited as described below but future releases will increase the amount of customization that is available. It is also possible to build Triton yourself to get more exact customization.
The two Docker images used for customization are retrieved using the following commands.
$ docker pull nvcr.io/nvidia/tritonserver:<xx.yy>-py3-min
$ docker pull nvcr.io/nvidia/tritonserver:<xx.yy>-py3
Where <xx.yy> is the version of Triton that you want to customize. The <xx.yy>-py3-min image is a minimal, base image that contains the CUDA, cuDNN, etc. dependencies that are required to run Triton. The <xx.yy>-py3 image contains the complete Triton with all options and backends.
To create an image containing the minimal possible Triton use the following multi-stage Dockerfile. As mentioned above the amount of customization currently available is limited. As a result the minimum Triton still contains both HTTP/REST and GRPC endpoints; S3, GCS and Azure Storage filesystem support; and the TensorRT and legacy custom backends.
FROM nvcr.io/nvidia/tritonserver:<xx.yy>-py3 as full
FROM nvcr.io/nvidia/tritonserver:<xx.yy>-py3-min
COPY --from full /opt/tritonserver/bin /opt/tritonserver/bin
COPY --from full /opt/tritonserver/lib /opt/tritonserver/lib
Then build the image.
$ docker build -t tritonserver_min .
One or more of the PyTorch, TensorFlow1, TensorFlow2, ONNX Runtime, Python, and DALI backends can be added to the minimum Triton image. The backend can be built from scratch or the appropriate backend directory can be copied from from the full Triton image. For example, to create a Triton image that creates a minimum Triton plus support for TensorFlow1 use the following Dockerfile.
FROM nvcr.io/nvidia/tritonserver:<xx.yy>-py3 as full
FROM nvcr.io/nvidia/tritonserver:<xx.yy>-py3-min
COPY --from full /opt/tritonserver/bin /opt/tritonserver/bin
COPY --from full /opt/tritonserver/lib /opt/tritonserver/lib
COPY --from full /opt/tritonserver/backends/tensorflow1 /opt/tritonserver/backends/tensorflow1
Depending on the backend it may also be necessary to include additional dependencies in the image. For example, the Python backend requires that Python3 be installed in the image.