Triton is built using the build.py script. The build.py script supports both a Docker build and a non-Docker build.
-
Build using Docker and the TensorFlow and PyTorch containers from NVIDIA GPU Cloud (NGC).
The easiest way to build Triton is to use Docker. The result of the build will be a Docker image called tritonserver that will contain the tritonserver executable in /opt/tritonserver/bin and the required shared libraries in /opt/tritonserver/lib. The backends built for Triton will be in /opt/tritonserver/backends (note that as of the 20.11 release the TensorRT backend is still included in the core of Triton and so does not appear in /opt/tritonserver/backends).
Building with Docker ensures that all the correct CUDA, cudnn, TensorRT and other dependencies are handled for you. A Docker build is the default when using build.py.
By default no Triton features are enabled. The following build.py invocation builds all features and backends.
$ ./build.py --build-dir=/tmp/citritonbuild --enable-logging --enable-stats --enable-tracing --enable-metrics --enable-gpu-metrics --enable-gpu --filesystem=gcs --filesystem=s3 --endpoint=http --endpoint=grpc --repo-tag=common:<container tag> --repo-tag=core:<container tag> --repo-tag=backend:<container tag> --backend=custom --backend=ensemble --backend=tensorrt --backend=identity:<container tag> --backend=repeat:<container tag> --backend=square:<container tag> --backend=onnxruntime:<container tag> --backend=pytorch:<container tag> --backend=tensorflow1:<container tag> --backend=tensorflow2:<container tag> --backend=python:<container tag> --backend=dali:<container tag>
If you are building on master/main branch then should be set to "main". If you are building on a release branch you should set the to match. For example, if you are building on the r20.10 branch you should set to be "r20.10". You can use a different for a component to instead use the corresponding branch/tag in the build. For example, if you have a branch called "mybranch" in the identity_backend repo that you want to use in the build, you would specify --backend=identity:mybranch.
By default build.py clones Triton repos from https://github.com/triton-inference-server. Use the --github-organization options to select a different URL.
The backends can also be built independently in each of the backend repositories. See the backend repo for more information.
To build Triton without using Docker follow the build.py steps described above except that you must also specify --no-container-build flag to build.py.
When building without Docker you must install the necessary CUDA libraries and other dependencies needed for the build before invoking build.py.
For Triton to support NVIDIA GPUs you must install CUDA, cuBLAS and cuDNN. These libraries must be installed on system include and library paths so that they are available for the build. The version of the libraries used in the Dockerfile build can be found in the Framework Containers Support Matrix.
For a given version of Triton you can attempt to build with non-supported versions of the libraries but you may have build or execution issues since non-supported versions are not tested.
The TensorRT includes and libraries must be installed on system include and library paths so that they are available for the build. The version of TensorRT used in the Dockerfile build can be found in the Framework Containers Support Matrix.
For a given version of Triton you can attempt to build with non-supported versions of TensorRT but you may have build or execution issues since non-supported versions are not tested.
For instructions on how to build support for TensorFlow see the TensorFlow backend.
For instructions on how to build support for ONNX Runtime see the ONNX Runtime backend and the CMakeLists.txt file contained in that repo. You must have a version of the ONNX Runtime available on the build system and set the TRITON_ONNXRUNTIME_INCLUDE_PATHS and TRITON_ONNXRUNTIME_LIB_PATHS cmake variables appropriately.