This project implements an end-to-end Machine Learning Operations (MLOps) pipeline for predicting hotel reservation cancellations. It integrates data preprocessing, model training, a Flask-based web application, CI/CD with Jenkins, and deployment using Docker and Google Cloud Run (assumed to be added). The project leverages modern MLOps tools and best practices to ensure scalability, reproducibility, and efficient model deployment.
The dataset used is sourced from a hotel reservation system, and the goal is to predict whether a reservation will be canceled based on various features. The pipeline includes data versioning, experiment tracking, model deployment, and automated workflows.
- Project Structure
- Features
- Technologies Used
- Installation
- Usage
- Dataset
- MLOps Pipeline
- Model Training
- Web Application
- CI/CD Pipeline
- Deployment
- Monitoring and Logging
- Contributing
- License
- Contact
MLOPS_PROJECT/
├── data/ # Dataset and data-related files
│ ├── raw/ # Raw dataset
│ ├── processed/ # Processed dataset
├── notebooks/ # Jupyter notebooks for EDA and prototyping
├── src/ # Source code for the project
│ ├── data_preprocessing.py # Data cleaning and preprocessing
│ ├── model_training.py # Model training and evaluation
│ ├── app.py # Flask web application
│ ├── predict.py # Prediction script
│ ├── utils.py # Utility functions
├── tests/ # Unit and integration tests
├── Dockerfile # Docker configuration
├── Jenkinsfile # Jenkins CI/CD pipeline configuration
├── requirements.txt # Python dependencies
├── README.md # Project documentation
├── .gitignore # Git ignore file
├── mlflow/ # MLflow tracking artifacts
└── scripts/ # Automation scripts
- End-to-End MLOps Pipeline: Covers data preprocessing, model training, experiment tracking, deployment, and monitoring.
- Data Versioning: Uses DVC for versioning datasets and models to ensure reproducibility.
- Experiment Tracking: MLflow for tracking experiments, hyperparameters, and model performance.
- Flask Web App: A user-friendly web interface for making predictions.
- CI/CD Integration: Automated testing, building, and deployment using Jenkins and GitHub Actions.
- Containerization: Docker for packaging the application and dependencies.
- Cloud Deployment: Assumed deployment to Google Cloud Run for scalable, serverless hosting.
- Modular Codebase: Organized and maintainable code structure for scalability.
- Programming Language: Python 3.8+
- Machine Learning: Scikit-learn, XGBoost
- Data Versioning: DVC
- Experiment Tracking: MLflow
- Web Framework: Flask
- Containerization: Docker
- CI/CD: Jenkins, GitHub Actions
- Cloud Platform: Google Cloud Run (assumed)
- Monitoring: Logging integrated with Flask (Prometheus/Grafana can be added)
- Others: Pandas, NumPy, Jupyter Notebooks
- Python 3.8 or higher
- Docker
- Git
- Google Cloud SDK (for Google Cloud Run deployment)
- Jenkins (for CI/CD pipeline)
- MLflow server (for experiment tracking)
- DVC (for data versioning)
-
Clone the Repository:
git clone https://github.com/sanatwalia896/MLOPS_PROJECT.git cd MLOPS_PROJECT -
Set Up Virtual Environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install Dependencies:
pip install -r requirements.txt
-
Initialize DVC:
dvc init dvc pull
-
Set Up MLflow: Ensure an MLflow tracking server is running. Update the MLflow tracking URI in
src/model_training.pyif needed:import mlflow mlflow.set_tracking_uri("http://<your-mlflow-server>:5000")
-
Set Up Docker (if running locally): Build the Docker image:
docker build -t hotel-cancellation-prediction .
- Start the Flask application:
python src/app.py
- Open your browser and navigate to
http://127.0.0.1:5000to access the web interface. - Input reservation details to get cancellation predictions.
Send a POST request to the Flask app:
curl -X POST -H "Content-Type: application/json" -d '{"lead_time": 30, "arrival_date_year": 2023, "arrival_date_month": 7, ...}' http://127.0.0.1:5000/predictRun unit and integration tests:
pytest tests/The dataset contains hotel reservation data with features such as:
lead_time: Number of days between booking and arrivalarrival_date_year: Year of arrivalarrival_date_month: Month of arrivalstays_in_weekend_nights: Number of weekend nightsstays_in_week_nights: Number of weekday nightsadults,children,babies: Number of guestsis_canceled: Target variable (1 for canceled, 0 for not canceled)
The raw dataset is stored in data/raw/, and processed data is saved in data/processed/ using DVC for versioning.
The project follows a comprehensive MLOps pipeline:
- Data Ingestion and Preparation:
- Raw data is cleaned and preprocessed using
src/data_preprocessing.py. - DVC tracks datasets to ensure reproducibility.
- Raw data is cleaned and preprocessed using
- Model Development and Training:
- Models (e.g., XGBoost, Random Forest) are trained using
src/model_training.py. - MLflow tracks experiments, hyperparameters, and metrics (e.g., accuracy, F1-score).
- Models (e.g., XGBoost, Random Forest) are trained using
- Model Evaluation:
- Models are evaluated using metrics like RMSE, MAE, and F1-score.
- Cross-validation ensures robustness.
- Model Deployment:
- The trained model is integrated into a Flask app (
src/app.py). - The app is containerized using Docker.
- Assumed deployment to Google Cloud Run for scalability.
- The trained model is integrated into a Flask app (
- CI/CD:
- Jenkins and GitHub Actions automate testing, building, and deployment.
- The
Jenkinsfiledefines the pipeline stages.
- Monitoring:
- Basic logging is implemented in the Flask app.
- Extendable to Prometheus and Grafana for advanced monitoring.
To train the model:
- Run the preprocessing script:
python src/data_preprocessing.py
- Train the model with MLflow tracking:
python src/model_training.py
- View experiment results in the MLflow UI:
Navigate to
mlflow ui
http://127.0.0.1:5000to explore logged metrics and artifacts.
The Flask-based web app (src/app.py) provides:
- A form for users to input reservation details.
- A prediction endpoint (
/predict) for API-based predictions. - Logging for tracking requests and errors.
To run the app locally:
python src/app.pyThe project uses Jenkins for CI/CD, with the pipeline defined in Jenkinsfile. Key stages include:
- Code Checkout: Pulls the latest code from GitHub.
- Testing: Runs unit and integration tests using pytest.
- Build: Builds the Docker image.
- Push: Pushes the Docker image to a registry (e.g., Docker Hub).
- Deploy: Deployment to Google Cloud Run.
To set up the CI/CD pipeline:
- Configure Jenkins with the repository URL.
- Ensure Docker and Google Cloud SDK are installed on the Jenkins server.
- Update the
Jenkinsfilewith your Docker registry and Google Cloud Run credentials.
The project assumes deployment to Google Cloud Run, a serverless platform for running containerized applications. Steps (assumed):
- Push the Docker image to a container registry (e.g., Google Container Registry):
docker tag hotel-cancellation-prediction gcr.io/<project-id>/hotel-cancellation-prediction docker push gcr.io/<project-id>/hotel-cancellation-prediction
- Deploy to Google Cloud Run:
gcloud run deploy hotel-cancellation-service \ --image gcr.io/<project-id>/hotel-cancellation-prediction \ --platform managed \ --region us-central1 \ --allow-unauthenticated
- Access the deployed app via the provided Cloud Run URL.
- Logging: The Flask app logs requests and errors to the console. Logs can be extended to a file or external service (e.g., ELK Stack).
- Monitoring: Integrate Prometheus and Grafana for real-time metrics (e.g., response time, error rates). Configuration is not included but can be added.
Contributions are welcome! To contribute:
- Fork the repository.
- Create a new branch (
git checkout -b feature-branch). - Make your changes and commit (
git commit -m "Add feature"). - Push to the branch (
git push origin feature-branch). - Open a pull request.
Please follow the Code of Conduct and review CONTRIBUTING.md for guidelines.
This project is licensed under the MIT License. See the LICENSE file for details.
For questions or inquiries, contact the project maintainer:
- GitHub: sanatwalia896
- Email: codersanat896@gmail.com