End-to-End Machine Learning Pipeline 🚀
This project is a full MLOps-grade system that predicts flight fare prices based on flight details such as airline, route, booking class, and time to departure.
- ⛓️ Modular pipeline stages
- 🧪 ML experiment tracking with MLflow
- 📦 Data & model versioning with DVC
- 🚀 REST API with FastAPI
- 🐳 Docker containerization
- ⚙️ CI/CD automation via GitHub Actions & AWS ECR
Data Ingestion → Data Validation → Data Cleaning →
Data Transformation → Model Training → Model Evaluation →
Prediction API →Docker Containerization → CI/CD → AWS ECR Deployment
| Category | Tools Used |
|---|---|
| Language | Python 3.10 |
| ML Framework | Scikit-learn, XGBoost |
| Experimentation | MLflow + Dagshub |
| Versioning | DVC |
| Deployment | FastAPI + Uvicorn |
| Packaging | Docker |
| Automation | GitHub Actions → AWS ECR → EC2 Deployment |
├── config/
├── src/mlproject/
│ ├── components/
│ ├── pipelines/
│ ├── config/
│ ├── entities/
│ └── utils/
├── artifacts/
├── dvc.yaml
├── params.yaml
├── schema.yaml
├── Dockerfile
├── app.py
├── README.md
├── requirements.txt
└── .github/workflows/cicd.yml
git clone https://github.com/JavithNaseem-J/FareFinder.git
cd FareFinderconda create <env name> python=3.10 -y
conda activate <env name>pip install -r requirements.txtEach pipeline stage is DVC-tracked and reproducible.
| Stage | Command |
|---|---|
| Data Ingestion | python main.py --stage data_ingestion |
| Data Validation | python main.py --stage data_validation |
| Data Cleaning | python main.py --stage data_cleaning |
| Data Transformation | python main.py --stage data_transformation |
| Model Training | python main.py --stage model_training |
| Model Evaluation | python main.py --stage model_evaluation |
Run the full pipeline:
dvc repro- Logs parameters, metrics (R², MAE, MSE), models
- Stores all experiments and best model in the MLflow registry
Build Docker image:
docker build -t flight-fare-app .Run the container:
docker run -p 8080:8080 flight-fare-appYour CI/CD workflow includes:
- ✅ Code linting
- ✅ Unit tests (placeholder)
- ✅ Docker image build
- ✅ Image push to AWS ECR
- ✅ Auto-deploy to EC2 container (self-hosted)
Workflow file:
.github/workflows/cicd.yml
- ✅ End-to-End ML lifecycle pipeline
- ✅ Model tuning via RandomizedSearchCV
- ✅ MLflow-based experiment tracking
- ✅ CI/CD auto-deployment with GitHub → AWS
- ✅ Production-grade FastAPI backend
Distributed under the MIT License.
See LICENSE for more information.


