Currently, we have implemented an oneflow-backend for the Triton Inference Server that enables model serving.
OneFlow Backend For Triton Inference Server
Here is a tutorial about how to export the model and how to deploy it. You can also follow the instructions below to get started. Building the Docker image is necessary before you start.
- Download and save model
cd examples/resnet50/
python3 export_model.py
- Launch triton server
cd ../../ # back to root of the serving
docker run --rm --runtime=nvidia --network=host -v$(pwd)/examples:/models \
serving:final
curl -v localhost:8000/v2/health/ready # ready check
- Send images and predict
pip3 install tritonclient[all]
cd examples/resnet50/
curl -o cat.jpg https://images.pexels.com/photos/156934/pexels-photo-156934.jpeg
python3 client.py --image cat.jpg
- Tutorial (Chinese)
- Build
- Model Configuration
- OneFlow Cookies: Serving (Chinese)
- OneFlow Cookies: Serving (English)
- Command Line Tool: oneflow-serving
The current version of oneflow does not support concurrent execution of multiple model instances. You can launch multiple containers (which is easy with Kubernetes) to bypass this limitation.