Skip to content

triton-inference-server/paddlepaddle_backend

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

English | 简体中文

Triton Paddle Backend

Table of Contents

Quick Start

Pull Image

docker pull paddlepaddle/triton_paddle:21.10

Note: Only Triton Inference Server 21.10 image is supported.

Create A Model Repository

The model repository is the directory where you place the models that you want Triton to server. An example model repository is included in the examples. Before using the repository, you must fetch it by the following scripts.

$ cd examples
$ ./fetch_models.sh
$ cd .. # back to root of paddle_backend

Launch Triton Inference Server

  1. Launch the image
$ docker run --gpus=all --rm -it --name triton_server --net=host -e CUDA_VISIBLE_DEVICES=0 \
           -v `pwd`/examples/models:/workspace/models \
           paddlepaddle/triton_paddle:21.10 /bin/bash
  1. Launch the triton inference server
/opt/tritonserver/bin/tritonserver --model-repository=/workspace/models

Note: /opt/tritonserver/bin/tritonserver --help for all available parameters

Verify Triton Is Running Correctly

Use Triton’s ready endpoint to verify that the server and the models are ready for inference. From the host system use curl to access the HTTP endpoint that indicates server status.

$ curl -v localhost:8000/v2/health/ready
...
< HTTP/1.1 200 OK
< Content-Length: 0
< Content-Type: text/plain

The HTTP request returns status 200 if Triton is ready and non-200 if it is not ready.

Examples

Before running the examples, please make sure the triton server is running correctly.

Change working directory to examples

$ cd examples

ERNIE Base

ERNIE-2.0 is a pre-training framework for language understanding.

Steps to run the benchmark on ERNIE

$ bash perf_ernie.sh

ResNet50 v1.5

The ResNet50-v1.5 is a modified version of the original ResNet50 v1 model.

Steps to run the benchmark on ResNet50-v1.5

$ bash perf_resnet50_v1.5.sh

Steps to run the inference on ResNet50-v1.5.

  1. Prepare processed images following DeepLearningExamples and place imagenet folder under examples directory.

  2. Run the inference

$ bash infer_resnet_v1.5.sh imagenet/<id>

Performance

ERNIE Base (T4)

Precision Backend Accelerator Client Batch Size Sequences/second P90 Latency (ms) P95 Latency (ms) P99 Latency (ms) Avg Latency (ms)
FP16 TensorRT 1 270.0 3.813 3.846 4.007 3.692
FP16 TensorRT 2 500.4 4.282 4.332 4.709 3.980
FP16 TensorRT 4 831.2 5.141 5.242 5.569 4.797
FP16 TensorRT 8 1128.0 7.788 7.949 8.255 7.089
FP16 TensorRT 16 1363.2 12.702 12.993 13.507 11.738
FP16 TensorRT 32 1529.6 22.495 22.817 24.634 20.901

ResNet50 v1.5 (V100-SXM2-16G)

Precision Backend Accelerator Client Batch Size Sequences/second P90 Latency (ms) P95 Latency (ms) P99 Latency (ms) Avg Latency (ms)
FP16 TensorRT 1 288.8 3.494 3.524 3.608 3.462
FP16 TensorRT 2 494.0 4.083 4.110 4.208 4.047
FP16 TensorRT 4 758.4 5.327 5.359 5.460 5.273
FP16 TensorRT 8 1044.8 7.728 7.770 7.949 7.658
FP16 TensorRT 16 1267.2 12.742 12.810 13.883 12.647
FP16 TensorRT 32 1113.6 28.840 29.044 30.357 28.641
FP16 TensorRT 64 1100.8 58.512 58.642 59.967 58.251
FP16 TensorRT 128 1049.6 121.371 121.834 123.371 119.991

ResNet50 v1.5 (T4)

Precision Backend Accelerator Client Batch Size Sequences/second P90 Latency (ms) P95 Latency (ms) P99 Latency (ms) Avg Latency (ms)
FP16 TensorRT 1 291.8 3.471 3.489 3.531 3.427
FP16 TensorRT 2 466.0 4.323 4.336 4.382 4.288
FP16 TensorRT 4 665.6 6.031 6.071 6.142 6.011
FP16 TensorRT 8 833.6 9.662 9.684 9.767 9.609
FP16 TensorRT 16 899.2 18.061 18.208 18.899 17.748
FP16 TensorRT 32 761.6 42.333 43.456 44.167 41.740
FP16 TensorRT 64 793.6 79.860 80.410 80.807 79.680
FP16 TensorRT 128 793.6 158.207 158.278 158.643 157.543

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •