A rust based service for serving machine learning predictions.
- Supports REST and gRPC
- Concurrency model: thread per model
- Supports per model queues via async channels
- Onnxruntime backend
In order to build the gRPC server, you'l need the protoc Protocol Buffers compiler. See the tonic documentation to set this up.
Onnxruntime is the engine we use to power inference. See onnxruntime docs for more complete installation instructions.
brew install onnxruntimeExport the lib path:
export ORT_STRATEGY=system
export ORT_LIB_LOCATION=$(brew --prefix onnxruntime)On Linux, you'll need to download the tarball
wget https://github.com/microsoft/onnxruntime/releases/download/v1.14.1/onnxruntime-linux-x64-1.14.1.tgz
tar zxf onnxruntime-linux-x64-1.14.1.tgz
mv onnxruntime-linux-x64-1.14.1 /lib/onnxruntimeexport ORT_STRATEGY=system
export ORT_LIB_LOCATION=/lib/onnxruntime
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/lib/onnxruntime/libDownload the test models:
curl -LO "https://github.com/onnx/models/raw/main/vision/classification/squeezenet/model/squeezenet1.0-8.onnx"
curl -LO "https://github.com/onnx/models/raw/main/vision/object_detection_segmentation/mask-rcnn/model/MaskRCNN-10.onnx"Start the service
cargo runSee the deployed models:
curl -X GET -H "Content-Type: application/json" http://localhost:8080/modelsThe client script runs many concurrent requests against the server and logs the elapsed for each request and the total time. To run the client:
cargo run --bin clientWe also provide the same client for gRPC:
cargo run --bin grpc-clientRunning locally on an M1 Macbook pro, gRPC performs better. The gap is marginal for MaskRCNN where compute is a much larger proportion of the response time
| Client Type | Model | Total Time | Average Time | Median Time | p95 Time | p99 Time |
|---|---|---|---|---|---|---|
| client | squeezenet | 1.225398832s | 136.155425ms | 143.143708ms | 198.443083ms | 198.443083ms |
| grpc-client | squeezenet | 182.413958ms | 18.241395ms | 17.662917ms | 20.213208ms | 20.213208ms |
| client | maskrcnn | 76.612492085s | 6.964772007s | 6.975712042s | 11.401473875s | 12.505639542s |
| grpc-client | maskrcnn | 57.749369832s | 5.774936983s | 5.363160166s | 10.23204925s | 10.23204925s |