The Triton backend for TensorFlow. You can learn more about backends in the backend repo. Ask questions or report problems in the main Triton issues page.
Full documentation is included below but these shortcuts can help you get started in the right direction.
Be sure to read all the information below as well as the general Triton documentation available in the main server repo. If you don't find your answer there you can ask questions on the main Triton issues page.
The TensorFlow backend supports both TensorFlow 1.x and 2.x. Each release of Triton will container support for a specific 1.x and 2.x version. You can find the specific version supported for any release by checking the Release Notes which are available from the main server repo.
Each model's configuration can enabled TensorFlow-specific optimizations. There are also a few command-line options that can be used to configure the backend when launching Triton.
See build instructions below.
Currently you must use a version of TensorFlow from NGC. See custom TensorFlow build instructions below.
The command-line options configure properties of the TensorFlow backend that are then applied to all models that use the backend.
Instruct TensorFlow to use CPU implementation of an operation when a GPU implementation is not available.
Reserve a portion of GPU memory for TensorFlow models. Default value 0.0 indicates that TensorFlow should dynamically allocate memory as needed. Value of 1.0 indicates that TensorFlow should allocate all of GPU memory.
Select the version of the TensorFlow library to be used, available versions are 1 and 2. Default version is 1.
Use a recent cmake to build. First install the required dependencies.
$ apt-get install patchelf rapidjson-dev
The backend can be built to support either TensorFlow 1.x or TensorFlow 2.x. An appropriate TensorFlow container from NGC must be used. For example, to build a backend that uses the 20.12 version of the TensorFlow 1.x container from NGC:
$ mkdir build
$ cd build
$ cmake -DCMAKE_INSTALL_PREFIX:PATH=`pwd`/install -DTRITON_TENSORFLOW_VERSION=1 -DTRITON_TENSORFLOW_DOCKER_IMAGE="nvcr.io/nvidia/tensorflow:20.12-tf1-py3" ..
$ make install
For example, to build a backend that uses the 20.12 version of the TensorFlow 2.x container from NGC:
$ mkdir build
$ cd build
$ cmake -DCMAKE_INSTALL_PREFIX:PATH=`pwd`/install -DTRITON_TENSORFLOW_VERSION=2 -DTRITON_TENSORFLOW_DOCKER_IMAGE="nvcr.io/nvidia/tensorflow:20.12-tf2-py3" ..
$ make install
The following required Triton repositories will be pulled and used in the build. By default the "main" branch/tag will be used for each repo but the listed CMake argument can be used to override.
- triton-inference-server/backend: -DTRITON_BACKEND_REPO_TAG=[tag]
- triton-inference-server/core: -DTRITON_CORE_REPO_TAG=[tag]
- triton-inference-server/common: -DTRITON_COMMON_REPO_TAG=[tag]
Currently, Triton requires that a specially patched version of TensorFlow be used with the TensorFlow backend. The full source for these TensorFlow versions are available as Docker images from NGC. For example, the TensorFlow 1.x version compatible with the 20.12 release of Triton is available as nvcr.io/nvidia/tensorflow:20.12-tf1-py3 and the TensorFlow 2.x version compatible with the 20.12 release of Triton is available as nvcr.io/nvidia/tensorflow:20.12-tf2-py3.
You can modify and rebuild TensorFlow within these images to generate the shared libraries needed by the Triton TensorFlow backend. In the TensorFlow 1.x or TensorFlow 2.x container you rebuild using:
$ ./nvbuild.sh --python3.6 --triton
After rebuilding within the container you should save the updated container as a new Docker image (for example, by using docker commit), and then build the backend as described above with TRITON_TENSORFLOW_DOCKER_IMAGE set to refer to the new Docker image.