Clients can communicate with Triton using either an HTTP/REST or GRPC protocol, or by a C API.
Triton exposes both HTTP/REST and GRPC endpoints based on standard inference protocols that have been proposed by the KFServing project. To fully enable all capabilities Triton also implements a number HTTP/REST and GRPC extensions. to the KFServing inference protocol.
The HTTP/REST and GRPC protcols provide endpoints to check server and model health, metadata and statistics. Additional endpoints allow model loading and unloading, and inferencing. See the KFServing and extension documentation for details.
The Triton Inference Server provides a backwards-compatible C API that allows Triton to be linked directly into a C/C++ application. The API is documented in tritonserver.h.
A simple example using the C API can be found in simple.cc. A more complicated example can be found in the source that implements the HTTP/REST and GRPC endpoints for Triton. These endpoints use the C API to communicate with the core of Triton. The primary source files for the endpoints are grpc_server.cc and http_server.cc.