This project demonstrates how to put a machine learning model into production using an offline pipeline and creating an API. The key points covered include fetching data, running the model, saving predictions, and protecting credentials.
-
Offline Pipeline Workflow:
- Fetch Data: Retrieve data from the database needed for model inference.
- Run Model: Execute the model on the data (after cleaning and processing).
- Save Predictions: Store the model predictions in a database table for easy retrieval using SQL queries.
-
Command-Line Argument Parsing:
argparseis a popular Python library for parsing command-line arguments when running Python scripts via a terminal.
-
Model Serialization:
joblibis a Python package that can handle saving and loading machine learning models.
-
Credential Management:
- The
keyringpackage can be used to protect credentials.
- The
-
API for Model Inference:
- An alternative way to put a model into production is to create an API.
- APIs allow you to call a model residing on a different server or environment.
-
REST API:
- One of the most common API architectures is REST.
- REST APIs allow you to make HTTPS requests, such as GET or POST requests.
-
FastAPI:
- FastAPI is a powerful library for creating APIs for your Python code.
- Python 3.x
argparsefor command-line argument parsingjoblibfor model serializationkeyringfor credential managementFastAPIfor creating APIssklearnfor machine learning models
-
Offline Pipeline:
- Run the script to fetch data, run the model, and save predictions:
python ml_pipeline_train_model.py -c config.json
-
API:
- Create an API using FastAPI:
uvicorn api:app --reload
ML_Pipeline/
│
├── data/
│ ├── customer_churn_data.csv
│ └── customers/
│
├── customerData.py
├── dataSet.py
├── main.py
├── ml_model.py
├── ml_offline_predictions.py
├── ml_offline_predictions_config.json
├── ml_pipeline_config.json
├── ml_pipeline_train_model.py
└── README.md