The client libraries provide APIs that make it easy to communicate with Triton from your C++ or Python application. Using these libraries you can send either HTTP/REST or GRPC requests to Triton to access all its capabilities: inferencing, status and health, statistics and metrics, model repository management, etc. These libraries also support using system and CUDA shared memory for passing inputs to and receiving outputs from Triton. Examples show the use of both the C++ and Python libraries.
The easiest way to get the Python client library is to use pip to install the tritonclient module. You can also download both C++ and Python client libraries from Triton GitHub release, or download a pre-built Docker image containing the client libraries from NVIDIA GPU Cloud (NGC).
It is also possible to build build the client libraries with Docker or with cmake.
The GRPC and HTTP client libraries are available as a Python package that can be installed using a recent version of pip. Currently pip install is only available on Linux.
$ pip install nvidia-pyindex
$ pip install tritonclient[all]
Using all installs both the HTTP/REST and GRPC client libraries. There are two optional packages available, grpc and http that can be used to install support specifically for the protocol. For example, to install only the HTTP/REST client library use,
$ pip install nvidia-pyindex
$ pip install tritonclient[http]
The components of the install packages are:
- http
- grpc [
service_pb2
,service_pb2_grpc
,model_config_pb2
] - utils [ linux distribution will include
shared_memory
andcuda_shared_memory
]
The Linux version of the package also includes the perf_analyzer binary. The perf_analyzer binary is built on Ubuntu 20.04 and may not run on other Linux distributions. To run the perf_analyzer the following dependency must be installed:
sudo apt update
sudo apt install libb64-dev
The client libraries and the perf_analyzer executable can be downloaded from the Triton GitHub release page corresponding to the release you are interested in. The client libraries are found in the "Assets" section of the release page in a tar file named after the version of the release and the OS, for example, v2.3.0_ubuntu1804.clients.tar.gz.
The pre-built libraries can be used on the corresponding host system or you can install them into the Triton container to have both the clients and server in the same container.
$ mkdir clients
$ cd clients
$ wget https://github.com/triton-inference-server/server/releases/download/<tarfile_path>
$ tar xzf <tarfile_name>
After installing, the libraries can be found in lib/, the headers in include/, and the Python wheel files in python/. The bin/ and python/ directories contain the built examples that you can learn more about in Examples.
The perf_analyzer binary is built on Ubuntu 20.04 and may not run on other Linux distributions. To use the C++ libraries or perf_analyzer executable you must install some dependencies.
$ apt-get update
$ apt-get install curl libcurl4-openssl-dev libb64-dev
A Docker image containing the client libraries and examples is available from NVIDIA GPU Cloud (NGC). Before attempting to pull the container ensure you have access to NGC. For step-by-step instructions, see the NGC Getting Started Guide.
Use docker pull to get the client libraries and examples container from NGC.
$ docker pull nvcr.io/nvidia/tritonserver:<xx.yy>-py3-sdk
Where <xx.yy> is the version that you want to pull. Within the container the client libraries are in /workspace/install/lib, the corresponding headers in /workspace/install/include, and the Python wheel files in /workspace/install/python. The image will also contain the built client examples that you can learn more about in Examples.
To build the client libraries using Docker, first change directory to the root of the repo and checkout the release version of the branch that you want to build (or the master branch if you want to build the under-development version). The branch you use for the client build should match the version of Triton you are using.
$ git checkout r21.03
Then, issue the following command to build the C++ client library and the Python wheel files for the Python client library.
$ docker build -t tritonserver_sdk -f Dockerfile.sdk .
You can optionally add --build-arg "BASE_IMAGE=<base_image>" to set the base image that you want the client library built against. This base image must be an Ubuntu CUDA image to be able to build CUDA shared memory support. If CUDA shared memory support is not required, you can use Ubuntu 20.04 as the base image.
After the build completes the tritonserver_sdk docker image will contain the built client libraries in /workspace/install/lib, the corresponding headers in /workspace/install/include, and the Python wheel files in /workspace/install/python. The image will also contain the built client examples that you can learn more about in Examples.
The client library build is performed using CMake. IMPORTANT
Note that version 3.18.4 of cmake is needed to compile the
client. The build dependencies and requirements are shown in
Dockerfile.sdk
. To build without Docker you must first
install those dependencies along with required cmake version.
This section describes the client build for Ubuntu 20.04 and
Windows 10 systems.
To build the libraries using CMake, first change directory to the root of the repo and checkout the release version of the branch that you want to build (or the master branch if you want to build the under-development version).
$ git checkout r21.03
For Ubuntu, the dependencies and how to install them can be found in
Dockerfile.sdk
. The appropriate CUDA library must be installed
if TRITON_ENABLE_GPU=OFF is not specified in the cmake. Follow the
dockerfile closely till the cmake invocation. Also note that
the dependency name may be different depending on the version of the
system.
To build on Ubuntu, run the following to configure and build:
$ mkdir builddir && cd builddir
$ cmake -DCMAKE_BUILD_TYPE=Release ../build
$ make -j8 client
If you want to build a version of the client libraries and examples that does not include the CUDA shared memory support, use the following cmake configuration.
$ cmake -DTRITON_ENABLE_GPU=OFF -DTRITON_ENABLE_METRICS_GPU=OFF -DCMAKE_BUILD_TYPE=Release ../build
When the build completes the libraries can be found in client/install/lib, the corresponding headers in client/install/include, and the Python wheel files in client/install/python. The client/install directory will also contain the built client examples that you can learn more about in Examples.
For Windows, the dependencies can be installed using pip and vcpkg which is a C++ library management tool on Windows. The following shows how to install the dependencies using them, and you can also install the dependencies in other ways that you prefer.
> .\vcpkg.exe install openssl:x64-windows zlib:x64-windows rapidjson:x64-windows
> .\pip.exe install --upgrade setuptools grpcio-tools wheel
The vcpkg step above installs openssl, zlib and rapidjson, ":x64-windows" specifies the target and it is optional. The path to the libraries should be added to environment variable "PATH", by default it is \path\to\vcpkg\installed\<target>\bin. Update the pip to get the proper wheel from PyPi. Users may need to invoke pip.exe from a command line ran as an administrator.
To build the client for Windows, as there is no default build system available, you will need to specify the generator for CMake to match the build system you are using. For instance, if you are using Microsoft Visual Studio, you should do the following.
> cd build
> cmake -G"Visual Studio 16 2019" -DCMAKE_BUILD_TYPE=Release
> MSBuild.exe client.vcxproj -p:Configuration=Release
If you want to build a version of the client libraries and examples that does not include the CUDA shared memory support, use the following cmake configuration.
> cmake -G"Visual Studio 16 2019" -DTRITON_ENABLE_GPU=OFF -DTRITON_ENABLE_METRICS_GPU=OFF -DCMAKE_BUILD_TYPE=Release -DTRITON_COMMON_REPO_TAG:STRING=<tag> -DTRITON_CORE_REPO_TAG:STRING=<tag>
Where <tag> is "main" if you are building the clients from the master branch, or <tag> is "r<x>.<y>" if you are building on a release branch.
When the build completes the libraries can be found in client\install\lib, the corresponding headers in client\install\include, and the Python wheel files in client\install\python. The client\install directory will also contain the built client Python examples that you can learn more about in Examples. At this time the Windows build does not include the C++ examples.
The MSBuild.exe may need to be invoked twice for a successful build.
The C++ client API exposes a class-based interface. The commented interface is available in grpc_client.h, http_client.h, common.h.
The Python client API provides similar capabilities as the C++ API. The commented interface is available in grpc and http.
Examples describes the example applications that demonstrate different parts of the client library APIs.