2727-->
2828[ ![ License] ( https://img.shields.io/badge/License-BSD3-lightgrey.svg )] ( https://opensource.org/licenses/BSD-3-Clause )
2929
30- > [ !WARNING]
31- > You are currently on the ` main ` branch which tracks under-development progress
32- > towards the next release. The current release is version [ 2.60.0] ( https://github.com/triton-inference-server/server/releases/latest )
33- > and corresponds to the 25.08 container release on NVIDIA GPU Cloud (NGC).
34-
3530# Triton Inference Server
3631
3732Triton Inference Server is an open source inference serving software that
@@ -61,7 +56,7 @@ Major features include:
6156- Provides [ Backend API] ( https://github.com/triton-inference-server/backend ) that
6257 allows adding custom backends and pre/post processing operations
6358- Supports writing custom backends in python, a.k.a.
64- [ Python-based backends.] ( https://github.com/triton-inference-server/backend/blob/main /docs/python_based_backends.md#python-based-backends )
59+ [ Python-based backends.] ( https://github.com/triton-inference-server/backend/blob/r25.09 /docs/python_based_backends.md#python-based-backends )
6560- Model pipelines using
6661 [ Ensembling] ( docs/user_guide/architecture.md#ensemble-models ) or [ Business
6762 Logic Scripting
@@ -90,16 +85,16 @@ Inference Server with the
9085
9186``` bash
9287# Step 1: Create the example model repository
93- git clone -b r25.08 https://github.com/triton-inference-server/server.git
88+ git clone -b r25.09 https://github.com/triton-inference-server/server.git
9489cd server/docs/examples
9590./fetch_models.sh
9691
9792# Step 2: Launch triton from the NGC Triton container
98- docker run --gpus=1 --rm --net=host -v ${PWD} /model_repository:/models nvcr.io/nvidia/tritonserver:25.08 -py3 tritonserver --model-repository=/models --model-control-mode explicit --load-model densenet_onnx
93+ docker run --gpus=1 --rm --net=host -v ${PWD} /model_repository:/models nvcr.io/nvidia/tritonserver:25.09 -py3 tritonserver --model-repository=/models --model-control-mode explicit --load-model densenet_onnx
9994
10095# Step 3: Sending an Inference Request
10196# In a separate console, launch the image_client example from the NGC Triton SDK container
102- docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:25.08 -py3-sdk /workspace/install/bin/image_client -m densenet_onnx -c 3 -s INCEPTION /workspace/images/mug.jpg
97+ docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:25.09 -py3-sdk /workspace/install/bin/image_client -m densenet_onnx -c 3 -s INCEPTION /workspace/images/mug.jpg
10398
10499# Inference should return the following
105100Image ' /workspace/images/mug.jpg' :
@@ -172,10 +167,10 @@ configuration](docs/user_guide/model_configuration.md) for the model.
172167 [ Python] ( https://github.com/triton-inference-server/python_backend ) , and more
173168- Not all the above backends are supported on every platform supported by Triton.
174169 Look at the
175- [ Backend-Platform Support Matrix] ( https://github.com/triton-inference-server/backend/blob/main /docs/backend_platform_support_matrix.md )
170+ [ Backend-Platform Support Matrix] ( https://github.com/triton-inference-server/backend/blob/r25.09 /docs/backend_platform_support_matrix.md )
176171 to learn which backends are supported on your target platform.
177172- Learn how to [ optimize performance] ( docs/user_guide/optimization.md ) using the
178- [ Performance Analyzer] ( https://github.com/triton-inference-server/perf_analyzer/blob/main /README.md )
173+ [ Performance Analyzer] ( https://github.com/triton-inference-server/perf_analyzer/blob/r25.09 /README.md )
179174 and
180175 [ Model Analyzer] ( https://github.com/triton-inference-server/model_analyzer )
181176- Learn how to [ manage loading and unloading models] ( docs/user_guide/model_management.md ) in
@@ -189,14 +184,14 @@ A Triton *client* application sends inference and other requests to Triton. The
189184[ Python and C++ client libraries] ( https://github.com/triton-inference-server/client )
190185provide APIs to simplify this communication.
191186
192- - Review client examples for [ C++] ( https://github.com/triton-inference-server/client/blob/main /src/c%2B%2B/examples ) ,
193- [ Python] ( https://github.com/triton-inference-server/client/blob/main /src/python/examples ) ,
194- and [ Java] ( https://github.com/triton-inference-server/client/blob/main /src/java/src/main/java/triton/client/examples )
187+ - Review client examples for [ C++] ( https://github.com/triton-inference-server/client/blob/r25.09 /src/c%2B%2B/examples ) ,
188+ [ Python] ( https://github.com/triton-inference-server/client/blob/r25.09 /src/python/examples ) ,
189+ and [ Java] ( https://github.com/triton-inference-server/client/blob/r25.09 /src/java/src/main/java/triton/client/examples )
195190- Configure [ HTTP] ( https://github.com/triton-inference-server/client#http-options )
196191 and [ gRPC] ( https://github.com/triton-inference-server/client#grpc-options )
197192 client options
198193- Send input data (e.g. a jpeg image) directly to Triton in the [ body of an HTTP
199- request without any additional metadata] ( https://github.com/triton-inference-server/server/blob/main /docs/protocol/extension_binary_data.md#raw-binary-request )
194+ request without any additional metadata] ( https://github.com/triton-inference-server/server/blob/r25.09 /docs/protocol/extension_binary_data.md#raw-binary-request )
200195
201196### Extend Triton
202197
@@ -205,7 +200,7 @@ designed for modularity and flexibility
205200
206201- [ Customize Triton Inference Server container] ( docs/customization_guide/compose.md ) for your use case
207202- [ Create custom backends] ( https://github.com/triton-inference-server/backend )
208- in either [ C/C++] ( https://github.com/triton-inference-server/backend/blob/main /README.md#triton-backend-api )
203+ in either [ C/C++] ( https://github.com/triton-inference-server/backend/blob/r25.09 /README.md#triton-backend-api )
209204 or [ Python] ( https://github.com/triton-inference-server/python_backend )
210205- Create [ decoupled backends and models] ( docs/user_guide/decoupled_models.md ) that can send
211206 multiple responses for a request or not send any responses for a request
0 commit comments