Skip to content

πŸŽ‡ End-to-End ML Pipeline: US_VISA πŸŽ† An end-to-end machine learning pipeline built for the US_VISA project πŸ”„, including data ingestion, validation, training, and prediction. πŸ§ πŸ’Ό Using MongoDB for storage, EvidentlyAI for drift detection, and AWS for deployment, all automated with GitHub Actions πŸš€. Dockerized and ready to scale! πŸ”₯

License

Notifications You must be signed in to change notification settings

shaheennabi/US-visa-machine-learning-end-to-end-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

62 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ‡ US_VISA End-to-End Machine Learning Project with Evidently AI πŸŽ‡

In this project we have implemented end-to-end machine learning project named: US_VISA, we got the dataset from kaggle and we used MongoDB for storing and retreiving data, later we build entire pipeline like Data Ingestion, Data Validation, Data Transformation, Data Trainer, Data Evaluation, Data Pusher, Prediction pipeline etc, and we used EvidentlyAI to detect data drift and we used Github Actions to automate entire pipeline. We containerized the project with Docker and later used some AWS services like AWS S3 Bucket, AWS ECR, AWS EC2, for storing and retraining model, storing Docker image and running entire project respectively. πŸŽ‰πŸš€


πŸŽ‰ How to Run

To run this project, keep the original content unchanged. 🌟

conda create -n visa python=3.8 -y
conda activate visa
pip install -r requirements.txt

Export the environment variable(git bash)

πŸŽ† Workflow πŸŽ†

After creating the project template:
✨ * Update constants
✨ * Update Entity modules
✨ * Update respective component
✨ * Update the pipeline


AWS-CICD-Deployment-with-Github-Actions

1. Login to AWS console.

2. Create IAM user for deployment

With specific access

1. EC2 access : It is virtual machine

2. ECR: Elastic Container registry to save your docker image in aws

Description: About the deployment

1. Build docker image of the source code

2. Push your docker image to ECR

3. Launch Your EC2 

4. Pull Your image from ECR in EC2

5. Lauch your docker image in EC2
  • Policy:

    • AmazonEC2ContainerRegistryFullAccess

    • AmazonEC2FullAccess

    • AWSS3BucketFullAccess

3. Create ECR repo to store/save docker image

- (example) Save the URI:  31586523395366.dkr.ecr.us-east-1.amazonaws.com/visarepo

4. Create EC2 machine (Ubuntu)

5. Open EC2 and Install docker in EC2 Machine:

  • Copy the below command one-by-one and execute on EC2 Terminal
sudo apt-get update -y
sudo apt-get upgrade
  • required
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker ubuntu
newgrp docker

6. Configure EC2 as self-hosted runner:

setting>actions>runner>new self hosted runner> choose os>
  • then run command one by one on EC2 Terminal

πŸŽ‡ 7. Setup GitHub Secrets πŸŽ‡

To configure GitHub Actions for deployment, set the following secrets in your GitHub repository:
🌟 Secrets:

  • πŸ”‘ AWS_ACCESS_KEY_ID
  • πŸ”‘ AWS_SECRET_ACCESS_KEY
  • 🌍 AWS_DEFAULT_REGION
  • 🐳 ECR_REPO (URI or repository name)

πŸŽ‰ Project Tree Structure πŸŽ‰

.
└── US-VISA-MACHINE-LEARNING-END-TO-END-PROJECT/
    β”œβ”€β”€ .github/workflows/
    β”‚   └── aws.yaml
    β”œβ”€β”€ artifact/
    β”‚   └── 10_05_2024_03_23_14 (or time Stamp)/
    β”‚       β”œβ”€β”€ data_ingestion/
    β”‚       β”‚   β”œβ”€β”€ feature_store/
    β”‚       β”‚   β”‚   └── EasyVisa.csv
    β”‚       β”‚   └── ingested/
    β”‚       β”‚       β”œβ”€β”€ test.csv
    β”‚       β”‚       └── train.csv
    β”‚       β”œβ”€β”€ data_transformation/
    β”‚       β”‚   β”œβ”€β”€ transformed/
    β”‚       β”‚   β”‚   β”œβ”€β”€ test.npy
    β”‚       β”‚   β”‚   └── train.npy
    β”‚       β”‚   └── transformed_object/
    β”‚       β”‚       └── preprocessing.pkl
    β”‚       β”œβ”€β”€ data_validation/
    β”‚       β”‚   └── drift.repot/
    β”‚       β”‚       └── report.yaml
    β”‚       └── model_trainer/
    β”‚           └── trained_model/
    β”‚               └── model.pkl
    β”œβ”€β”€ config/
    β”‚   β”œβ”€β”€ model.yaml
    β”‚   └── schema.yaml
    β”œβ”€β”€ flowchat
    β”œβ”€β”€ logs
    β”œβ”€β”€ notebook/
    β”‚   β”œβ”€β”€ caboost
    β”‚   β”œβ”€β”€ boston_data_drift_report.html
    β”‚   β”œβ”€β”€ data_drift_demo_evidently.ipynb
    β”‚   β”œβ”€β”€ EasyVisa.csv
    β”‚   β”œβ”€β”€ EDA_us_visa_ipynb
    β”‚   β”œβ”€β”€ Feature_Engineering_and_Model_Training.ipynb
    β”‚   └── mongodb_demo.ipynb
    β”œβ”€β”€ static/
    β”‚   └── css/
    β”‚       └── style.css
    β”œβ”€β”€ templates/
    β”‚   └── usvisa.html
    β”œβ”€β”€ us_visa/
    β”‚   β”œβ”€β”€ __init__.py
    β”‚   β”œβ”€β”€ __pycache__/
    β”‚   β”œβ”€β”€ cloud_storage/
    β”‚   β”‚   β”œβ”€β”€ __init__.py
    β”‚   β”‚   └── aws_stroage.py
    β”‚   β”œβ”€β”€ components/
    β”‚   β”‚   β”œβ”€β”€ __pychache__/
    β”‚   β”‚   β”œβ”€β”€ __init__.py
    β”‚   β”‚   β”œβ”€β”€ data_ingestion.py
    β”‚   β”‚   β”œβ”€β”€ data_transformation.py
    β”‚   β”‚   β”œβ”€β”€ data_validation.py
    β”‚   β”‚   β”œβ”€β”€ model_evaluation.py
    β”‚   β”‚   β”œβ”€β”€ model_pusher.py
    β”‚   β”‚   └── model_trainer.py
    β”‚   β”œβ”€β”€ configuration/
    β”‚   β”‚   β”œβ”€β”€ __pycache__
    β”‚   β”‚   β”œβ”€β”€ logs/
    β”‚   β”‚   β”œβ”€β”€ __init__.py
    β”‚   β”‚   β”œβ”€β”€ aws_connection.py
    β”‚   β”‚   └── mongo_db_connection.py
    β”‚   β”œβ”€β”€ constants/
    β”‚   β”‚   β”œβ”€β”€ __pycache__/
    β”‚   β”‚   └── __init__.py
    β”‚   β”œβ”€β”€ data_access/
    β”‚   β”‚   β”œβ”€β”€ __pycache__/
    β”‚   β”‚   β”œβ”€β”€ __init__.py
    β”‚   β”‚   └── usvisa_data.py
    β”‚   β”œβ”€β”€ entity/
    β”‚   β”‚   β”œβ”€β”€ __pycache__/
    β”‚   β”‚   β”œβ”€β”€ __init__.py
    β”‚   β”‚   β”œβ”€β”€ artifact_entity.py
    β”‚   β”‚   β”œβ”€β”€ config_entity.py
    β”‚   β”‚   β”œβ”€β”€ estimator.py
    β”‚   β”‚   └── s3_estimator.py
    β”‚   β”œβ”€β”€ exception/
    β”‚   β”‚   β”œβ”€β”€ __pycache__/
    β”‚   β”‚   └── __init__.py
    β”‚   β”œβ”€β”€ logger/
    β”‚   β”‚   β”œβ”€β”€ __pycache__/
    β”‚   β”‚   └── __init__py
    β”‚   β”œβ”€β”€ pipeline/
    β”‚   β”‚   β”œβ”€β”€ __pycache__/
    β”‚   β”‚   β”œβ”€β”€ __init__.py
    β”‚   β”‚   β”œβ”€β”€ prediction_pipeline.py
    β”‚   β”‚   └── training_pipeline.py
    β”‚   └── utils/
    β”‚       β”œβ”€β”€ __pycache__/
    β”‚       β”œβ”€β”€ __init__.py
    β”‚       └── main_utils.py
    β”œβ”€β”€ us_visa.egg-info
    β”œβ”€β”€ .dockerignore
    β”œβ”€β”€ .gitignore
    β”œβ”€β”€ app.py
    β”œβ”€β”€ demo.py
    β”œβ”€β”€ Dockerfile
    β”œβ”€β”€ LICENSE
    β”œβ”€β”€ README.md
    β”œβ”€β”€ requirements.txt
    β”œβ”€β”€ setup.py
    └── template.py

About

πŸŽ‡ End-to-End ML Pipeline: US_VISA πŸŽ† An end-to-end machine learning pipeline built for the US_VISA project πŸ”„, including data ingestion, validation, training, and prediction. πŸ§ πŸ’Ό Using MongoDB for storage, EvidentlyAI for drift detection, and AWS for deployment, all automated with GitHub Actions πŸš€. Dockerized and ready to scale! πŸ”₯

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published