An automated, end-to-end machine learning pipeline for training, evaluating, and versioning models. This project was built to demonstrate a foundational understanding of MLOps principles.
This pipeline automates the entire machine learning workflow. It starts by ingesting raw data, proceeds to clean and preprocess it, trains a machine learning model, tunes its hyperparameters, and finally versions the resulting model and its performance metrics.
- Automated Data Preprocessing: Handles missing values, scales numerical features, and encodes categorical features.
- Model Training: Trains a classification/regression model on the prepared data.
- Hyperparameter Tuning: Automatically finds the best hyperparameters using techniques like Grid Search.
- Model Versioning: Uses MLflow to track experiments, log metrics, and version models.
- Language: Python
- Data Handling: Pandas, NumPy
- ML Framework: Scikit-learn
- Experiment Tracking: MLflow
- Orchestration: Python scripts
-
Clone the repository:
git clone https://github.com/Fugant1/ml-model-factory.git
-
Install the dependencies:
pip install -r requirements.txt
-
Move your dataset to the data folder:
mv your_dataset_file data/
-
Configure the pipeline: Open src/config.py and set the FILENAME, TARGET_COLUMN, and MODEL_SELECTED variables to match your data and preferences.
-
Run the main pipeline script:
python -m src.pipeline.py