In this project we have implemented end-to-end machine learning project named: US_VISA, we got the dataset from kaggle and we used MongoDB for storing and retreiving data, later we build entire pipeline like Data Ingestion, Data Validation, Data Transformation, Data Trainer, Data Evaluation, Data Pusher, Prediction pipeline etc, and we used EvidentlyAI to detect data drift and we used Github Actions to automate entire pipeline. We containerized the project with Docker and later used some AWS services like AWS S3 Bucket, AWS ECR, AWS EC2, for storing and retraining model, storing Docker image and running entire project respectively. ππ
To run this project, keep the original content unchanged. π
conda create -n visa python=3.8 -y
conda activate visa
pip install -r requirements.txt
After creating the project template:
β¨ * Update constants
β¨ * Update Entity modules
β¨ * Update respective component
β¨ * Update the pipeline
With specific access
1. EC2 access : It is virtual machine
2. ECR: Elastic Container registry to save your docker image in aws
1. Build docker image of the source code
2. Push your docker image to ECR
3. Launch Your EC2
4. Pull Your image from ECR in EC2
5. Lauch your docker image in EC2
-
Policy:
-
AmazonEC2ContainerRegistryFullAccess
-
AmazonEC2FullAccess
-
AWSS3BucketFullAccess
-
- (example) Save the URI: 31586523395366.dkr.ecr.us-east-1.amazonaws.com/visarepo
- Copy the below command one-by-one and execute on EC2 Terminal
sudo apt-get update -y
sudo apt-get upgrade
- required
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker ubuntu
newgrp docker
setting>actions>runner>new self hosted runner> choose os>
- then run command one by one on EC2 Terminal
To configure GitHub Actions for deployment, set the following secrets in your GitHub repository:
π Secrets:
- π AWS_ACCESS_KEY_ID
- π AWS_SECRET_ACCESS_KEY
- π AWS_DEFAULT_REGION
- π³ ECR_REPO (URI or repository name)
.
βββ US-VISA-MACHINE-LEARNING-END-TO-END-PROJECT/
βββ .github/workflows/
β βββ aws.yaml
βββ artifact/
β βββ 10_05_2024_03_23_14 (or time Stamp)/
β βββ data_ingestion/
β β βββ feature_store/
β β β βββ EasyVisa.csv
β β βββ ingested/
β β βββ test.csv
β β βββ train.csv
β βββ data_transformation/
β β βββ transformed/
β β β βββ test.npy
β β β βββ train.npy
β β βββ transformed_object/
β β βββ preprocessing.pkl
β βββ data_validation/
β β βββ drift.repot/
β β βββ report.yaml
β βββ model_trainer/
β βββ trained_model/
β βββ model.pkl
βββ config/
β βββ model.yaml
β βββ schema.yaml
βββ flowchat
βββ logs
βββ notebook/
β βββ caboost
β βββ boston_data_drift_report.html
β βββ data_drift_demo_evidently.ipynb
β βββ EasyVisa.csv
β βββ EDA_us_visa_ipynb
β βββ Feature_Engineering_and_Model_Training.ipynb
β βββ mongodb_demo.ipynb
βββ static/
β βββ css/
β βββ style.css
βββ templates/
β βββ usvisa.html
βββ us_visa/
β βββ __init__.py
β βββ __pycache__/
β βββ cloud_storage/
β β βββ __init__.py
β β βββ aws_stroage.py
β βββ components/
β β βββ __pychache__/
β β βββ __init__.py
β β βββ data_ingestion.py
β β βββ data_transformation.py
β β βββ data_validation.py
β β βββ model_evaluation.py
β β βββ model_pusher.py
β β βββ model_trainer.py
β βββ configuration/
β β βββ __pycache__
β β βββ logs/
β β βββ __init__.py
β β βββ aws_connection.py
β β βββ mongo_db_connection.py
β βββ constants/
β β βββ __pycache__/
β β βββ __init__.py
β βββ data_access/
β β βββ __pycache__/
β β βββ __init__.py
β β βββ usvisa_data.py
β βββ entity/
β β βββ __pycache__/
β β βββ __init__.py
β β βββ artifact_entity.py
β β βββ config_entity.py
β β βββ estimator.py
β β βββ s3_estimator.py
β βββ exception/
β β βββ __pycache__/
β β βββ __init__.py
β βββ logger/
β β βββ __pycache__/
β β βββ __init__py
β βββ pipeline/
β β βββ __pycache__/
β β βββ __init__.py
β β βββ prediction_pipeline.py
β β βββ training_pipeline.py
β βββ utils/
β βββ __pycache__/
β βββ __init__.py
β βββ main_utils.py
βββ us_visa.egg-info
βββ .dockerignore
βββ .gitignore
βββ app.py
βββ demo.py
βββ Dockerfile
βββ LICENSE
βββ README.md
βββ requirements.txt
βββ setup.py
βββ template.py