House Price Prediction with ZenML & MLflow (MLOps Pipeline)

This project demonstrates an end-to-end machine learning pipeline for house price prediction using ZenML and MLflow. It incorporates MLOps principles to automate and track various steps of the pipeline, including data ingestion, preprocessing, model building, and deployment.

Pipeline overview img from dashboard

Introduction

The project aims to predict house prices based on various features of the house. It leverages ZenML to create reproducible machine learning pipelines and MLflow for experiment tracking and model deployment. The pipeline automates the process from data ingestion to model evaluation and is set up for continuous integration and deployment (CI/CD) for machine learning.

Project Structure

house-price-predictor/

├── analysis/
│   ├── EDA.ipynb                    # Exploratory Data Analysis notebook
│   └── analyze_src/
│       ├── basic_data_inspection.py    
│       ├── bivariate_analysis.py
│       ├── missing_values_analysis.py    
│       ├── univariate_analysis.py  
├── data/
│   └── archive.zip         # Raw data for house prices
├── extracted_data/
│   └── AmesHousing.csv        # generated file
├── src/
│   ├── data_splitter.py          # Splitting data into training and test sets
│   ├── feature_engineering.py     # Feature engineering logic
│   ├── handle_missing_values.py   # Handling missing values
│   ├── ingest_data.py             # Data ingestion logic
│   ├── model_building.py          # Model training logic
│   ├── model_evaluator.py         # Model evaluation logic
│   ├── outlier_detection.py       # Detect and handle outliers
├── steps/
│   ├── __pycache__/                   # Compiled Python files
│   ├── data_ingestion_step.py         # Data ingestion logic
│   ├── data_splitter_step.py          # Data splitting into training and test sets
│   ├── dynamic_importer.py            # Dynamic importing for pipeline steps
│   ├── feature_engineering_step.py    # Feature engineering logic
│   ├── handle_missing_values_step.py  # Handling missing values
│   ├── model_building_step.py         # Model training step
│   ├── model_evaluator_step.py        # Model evaluation step
│   ├── model_loader.py                # Model loading functionality
│   ├── outlier_detection_step.py      # Detect and handle outliers
│   ├── prediction_service_loader.py   # Loads the prediction service for deployment
│   ├── predictor.py                   # Script for making predictions
├── pipelines/
│   ├── deployment_pipeline.py   # Definition of the deployment pipeline
│   ├── training_pipeline.py     # Definition of the training pipeline
├── run_pipeline.py           # Script to run the pipeline
├── deployement.py            # Script to deploy the model using MLflow
├── requirements.txt          # Python dependencies
├── README.md                 # This file

Technologies Used

ZenML: Framework for creating reproducible ML pipelines.
MLflow: Experiment tracking, model management, and deployment.
Python: Core programming language for the project.
Pandas, Scikit-learn: Data manipulation and machine learning libraries.

Pipeline Steps

Data Ingestion: Reads the raw data from a ZIP file and loads it into a Pandas DataFrame.
Handling Missing Values: Cleans the dataset by filling missing values using specified strategies.
Feature Engineering: Creates new features, applies log transformations to relevant features.
Outlier Detection: Identifies and removes outliers from the dataset based on specific criteria.
Data Splitting: Splits the data into training and test sets.
Model Building: Trains a machine learning model using Scikit-learn and tracks the experiment with MLflow.
Model Evaluation: Evaluates the model using MSE and other metrics, and logs them to MLflow.

How to Run the Project

Clone the Repository

git clone https://github.com/your-username/house-price-predictor.git
cd house-price-predictor

Set Up a Virtual Environment

python -m venv venv
source venv/bin/activate   # For Linux/Mac
venv\Scripts\activate      # For Windows

Install the Requirements
```
pip install -r requirements.txt
```

Initialize ZenML and Set Up Stack

zenml init
zenml integration install mlflow -y

# Register a stack with MLflow for experiment tracking
zenml experiment-tracker register mlflow_tracker --flavor=mlflow
zenml model-deployer register mlflow --flavor=mlflow
zenml stack register local-mlflow-stack -a default -o default -d mlflow -e mlflow_tracker --set

Run the Pipeline
```
python run_pipeline.py
```

Run the MLflow UI

mlflow ui --backend-store-uri 'sqlite:///mlflow.db'

Experiment Tracking with MLflow

MLflow is used for tracking all your experiments. You can inspect your model's performance and compare different runs through the MLflow UI. After running the pipeline, use the following command to launch the MLflow UI:

mlflow ui

Deployment

The project includes a deployment script (run_deployement.py) to deploy the trained model using MLflow. To deploy, simply run the following command:

python run_deployement.py

CI/CD

This project adheres to basic CI/CD principles for machine learning, enabling automated runs, versioning of pipelines, and seamless deployment. Integration with tools like GitHub Actions can be added for fully automated pipelines.

Results

After running the model, the evaluation metrics such as Mean Squared Error (MSE) are tracked in MLflow. The trained model can be accessed and evaluated through the MLflow dashboard.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

House Price Prediction with ZenML & MLflow (MLOps Pipeline)

Pipeline overview img from dashboard

Table of Contents

Introduction

Project Structure

Technologies Used

Pipeline Steps

How to Run the Project

Experiment Tracking with MLflow

Deployment

CI/CD

Results

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
analysis		analysis
data		data
extracted_data		extracted_data
pipelines		pipelines
src		src
steps		steps
README.md		README.md
config.yaml		config.yaml
pipeline_img.png		pipeline_img.png
requirements.txt		requirements.txt
run_deployment.py		run_deployment.py
run_pipeline.py		run_pipeline.py
sample_predict.py		sample_predict.py

vn33/MLOps_House-Price-Prediction-using-ZenML-and-MLflow

Folders and files

Latest commit

History

Repository files navigation

House Price Prediction with ZenML & MLflow (MLOps Pipeline)

Pipeline overview img from dashboard

Table of Contents

Introduction

Project Structure

Technologies Used

Pipeline Steps

How to Run the Project

Experiment Tracking with MLflow

Deployment

CI/CD

Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages