- designed following the Open Inference Protocol — a growing industry standard for standardized, observable, and interoperable machine learning inference
- auto-documentation using FastAPI and Pydantic
- add linting, testing and pre-commit hooks
- build and push an Docker image of the API to Docker Hub
- use Github Actions for automation
API | Verb | Path |
---|---|---|
Inference | POST | v2/models/[/versions/<model_version>]/infer |
Model Metadata | GET | v2/models/<model_name>[/versions/<model_version>] |
Server Ready | GET | v2/health/ready |
Server Live | GET | v2/health/live |
Server Metadata | GET | v2 |
Model Ready | GET | v2/models/<model_name>[/versions/]/ready |
API | Definition |
---|---|
Inference | The /infer endpoint performs inference on a model. The response is the prediction result. |
Model Metadata | The "model metadata" API is a per-model endpoint that returns details about the model passed in the path. |
Server Ready | The “server ready” health API indicates if all the models are ready for inferencing. The “server ready” health API can be used directly to implement the Kubernetes readinessProbe. |
Server Live | The “server live” health API indicates if the inference server is able to receive and respond to metadata and inference requests. The “server live” API can be used directly to implement the Kubernetes livenessProbe. |
Server Metadata | The "server metadata" API returns details describing the server. |
Model Ready | The “model ready” health API indicates if a specific model is ready for inferencing. The model name and (optionally) version must be available in the URL. |
Go to the 1/setup-start
branch and follow the instructions. For each following branch, the information is in the respective README.md
The structure is as follows:
- Setup
- Implement Endpoints
- Improve docs
- Restructure
- Add Linting & Tests
- CI with Github Actions
- Dockerise and push to Docker Hub
Throughout these above stages I share lots of links to documentation. Some libraries have great docs and thankfully the ones I have used here have amazing docs and explanations. If you learn anything form this repo, I hope it's atleast to get into the habit of looking at the docs of the libraries you use when you need an answer. If you just came here to have a quick look - read the FastAPI docs as a book, or the OIP docs page, or any of the other mentioned tool's docs.
While I think this takes a beginner from just /predict
and introduces them to some important concepts, I suggest looking into Eric Riddoch's teaching material: Taking Python to Production and Cloud Engineering for Python Devs.
If you are curious about MLOps on a wider scale (or you are curious about a model's life outside a jupyter notebook), I suggest these resource:
- (course) MLOps zoomcamp
- (blog+course) Marvelous MLOps
- (blog) MLOps
- (book) MLE with Python 2nd ed. by Andy McMahon
- (book) Designing ML Systems by Chip Huyen
- (paper) Towards CRISP-ML(Q): A Machine Learning Process Model with Quality Assurance Methodology
- make videos
- update the instructions for the steps based on feedback
- improve the endpoints' structure
- open to feedback and help to improve this 'course'