Clients can communicate with Triton using either an HTTP/REST or GRPC protocol, or by a C API.
Triton exposes both HTTP/REST and GRPC endpoints based on standard inference protocols that have been proposed by the KServe project. To fully enable all capabilities Triton also implements a number HTTP/REST and GRPC extensions to the KServe inference protocol.
The HTTP/REST and GRPC protcols provide endpoints to check server and model health, metadata and statistics. Additional endpoints allow model loading and unloading, and inferencing. See the KServe and extension documentation for details.
Triton provides the following configuration options for server-client network transactions over HTTP protocol.
Triton allows the on-wire compression of request/response on HTTP through its clients. See HTTP Compression for more details.
Triton exposes various GRPC parameters for configuring the server-client network transactions. For usage of these options, refer to the output from tritonserver --help
.
These options can be used to configure a secured channel for communication. The server-side options include:
--grpc-use-ssl
--grpc-use-ssl-mutual
--grpc-server-cert
--grpc-server-key
--grpc-root-cert
For client-side documentation, see Client-Side GRPC SSL/TLS
For more details on overview of authentication in gRPC, refer here.
Triton allows the on-wire compression of request/response messages by exposing following option on server-side:
--grpc-infer-response-compression-level
For client-side documentation, see Client-Side GRPC Compression
Compression can be used to reduce the amount of bandwidth used in server-client communication. For more details, see gRPC Compression.
Triton exposes GRPC KeepAlive parameters with the default values for both client and server described here.
These options can be used to configure the KeepAlive settings:
--grpc-keepalive-time
--grpc-keepalive-timeout
--grpc-keepalive-permit-without-calls
--grpc-http2-max-pings-without-data
--grpc-http2-min-recv-ping-interval-without-data
--grpc-http2-max-ping-strikes
For client-side documentation, see Client-Side GRPC KeepAlive.
The Triton Inference Server provides a backwards-compatible C API that allows Triton to be linked directly into a C/C++ application. The API is documented in tritonserver.h.
A simple example using the C API can be found in simple.cc. A more complicated example can be found in the source that implements the HTTP/REST and GRPC endpoints for Triton. These endpoints use the C API to communicate with the core of Triton. The primary source files for the endpoints are grpc_server.cc and http_server.cc.