Triton is built using the build.py script. The build.py script supports for the Docker build and a non-Docker build:
-
Build using Docker and the TensorFlow and PyTorch containers from NVIDIA GPU Cloud (NGC).
The easiest way to build Triton is to use Docker. The result of the build will be a Docker image called tritonserver that will contain the tritonserver executable in /opt/tritonserver/bin and the required shared libraries in /opt/tritonserver/lib. The backend built for Triton will be in /opt/tritonserver/backends (note that as of the 20.10 release the PyTorch and TensorRT backends are still included in the core of Triton and so do not appear in /opt/tritonserver/backends).
Building with Docker ensures that all the correct CUDA, cudnn, TensorRT and other dependencies are handled for you. A Docker build is enabled by using the --container-version flag with build.py. By default no Triton features are enabled. The following build.py invocation builds all features and backends.
$ ./build.py --version=<version> --container-version=<container version> --build-dir=/tmp/citritonbuild --enable-logging --enable-stats --enable-tracing --enable-metrics --enable-gpu-metrics --enable-gpu --filesystem=gcs --filesystem=s3 --endpoint=http --endpoint=grpc --repo-tag=common:<container tag> --repo-tag=core:<container tag> --repo-tag=backend:<container tag> --backend=custom --backend=ensemble --backend=tensorrt --backend=pytorch --backend=identity:<container tag> --backend=repeat:<container tag> --backend=square:<container tag> --backend=onnxruntime:<container tag> --backend=tensorflow1:<container tag> --backend=tensorflow2:<container tag> --backend=python:<container tag> --backend=dali:<container tag>
Where is the version to assign to Triton and is the version to assign to the produced Docker image. Typically you will set to something meaningful for your build and set to the value associated with the Triton version found in the VERSION file. You can find these associated values in CONTAINER_VERSION_MAP in build.py. For example, if the VERSION file contents are "2.4.0dev" then the build invocation should be:
$ ./build.py --version=0.0.0 --container-version=20.10dev ...
If you are building on master/main branch then should be set to "main". If you are building on a release branch you should set the to match. For example, if you are building on the r20.09 branch you should set to be "r20.09". If can use a different for a component to instead use the corresponding branch/tag in the build. For example, if you have a branch called "mybranch" in the identity_backend repo that you want to use in the build, you would specify --backend=identity:mybranch.
By default build.py clones Triton repos from https://github.com/triton-inference-server. Use the --github-organization options to select a different URL.
The backends can also be built independently in each of the backend repositories. See the backend repo for more information.
To build Triton without using Docker follow the build.py steps described above except do not specify --container-version.
When building without Docker you must install the necessary CUDA libraries and other dependencies needed for the build before invoking build.py.
For Triton to support NVIDIA GPUs you must install CUDA, cuBLAS and cuDNN. These libraries must be installed on system include and library paths so that they are available for the build. The version of the libraries used in the Dockerfile build can be found in the Framework Containers Support Matrix.
For a given version of Triton you can attempt to build with non-supported versions of the libraries but you may have build or execution issues since non-supported versions are not tested.
The TensorRT includes and libraries must be installed on system include and library paths so that they are available for the build. The version of TensorRT used in the Dockerfile build can be found in the Framework Containers Support Matrix.
For a given version of Triton you can attempt to build with non-supported versions of TensorRT but you may have build or execution issues since non-supported versions are not tested.
For instructions on how to build support for TensorFlow see the TensorFlow backend.
For instructions on how to build support for ONNX Runtime see the ONNX Runtime backend and the CMakeLists.txt file contained in that repo. You must have a version of the ONNX Runtime available on the build system and set the TRITON_ONNXRUNTIME_INCLUDE_PATHS and TRITON_ONNXRUNTIME_LIB_PATHS cmake variables appropriately.