The dataset The goal is to predict price
of given car (Regression Analysis).
There are 19 independent variables:
Make
: Company of the carYear
: Manufacturing Year of the carKilometer
: Total kilometers DrivenFuel Type
: Fuel type of the carTransmission
: Gear transmission of the carColor
: Color of the carOwner
: Number of previous ownersSeller Type
: Tells if car is sold by individual or dealerEngine
: Engine capacity of the car in ccDrivetrain
: AWD/RWD/FWDLength
: Length of the car in mmWidth
: Width of the car in mmHeight
: Height of the car in mmSeating Capacity
: Maximum people that can fir in a carFuel Tank Capacity
: Maximum fuel capacity of the car in litresTorquePower
: Torque power of the carTorquePowerRPM
: Torque power RPM of the carHorsePower
: Horse power of the carHorsePowerRPM
: Horse power RPM of the car
Target variable:
price
: Price of the given car.
Dataset Source Link : https://www.kaggle.com/datasets/nehalbirla/vehicle-dataset-from-cardekho?select=car+details+v4.csv
Clone the repository
https://github.com/Apheironn/End-to-end-Machine-Learning-Project-with-MLflow
conda create -n mlproj python=3.11 -y
conda activate mlproj
pip install -r requirements.txt
# Finally run the following command
python app.py
Now,
# Open up you local host and port
localhost:8080
First go to,
# You have to train your data first on:
localhost:8080/train
Finally,
# Go back to main address and enter values to estimate the price of the vehicle:
localhost:8080/
-
Create a Repository In Github Account:
- Created a new GitHub repository to host the project, making version control and collaboration easier.
-
Create Structure Using template.py:
- Organized the project structure using a Template.py file, establishing a clear layout for code and resources.
-
Implementing setup.py:
- Implemented a setup.py file to define project dependencies and metadata, simplifying package installation and distribution.
-
Logging Implementation:
- Integrated logging mechanisms throughout the project, allowing effective tracking of code execution and error debugging.
-
Data Ingestion:
- Developed data ingestion routines to fetch data from various sources (databases, files, APIs) and prepare it for further processing.
-
Data Validation:
- Implemented data validation steps to ensure the integrity and quality of incoming data, reducing the risk of incorrect inputs.
-
Data Transformation:
- Applied data transformation techniques, such as feature engineering and preprocessing, to prepare data for model training.
-
Model Trainer:
- Developed the model training phase, including selecting an appropriate algorithm, tuning hyperparameters, and fitting the model to the training data.
-
Prediction Pipeline:
- Created a prediction pipeline that takes in new data, applies the necessary transformations, and generates predictions using the trained model.
-
Set MLflow and Docker:
- Integrated MLflow to track experiments, model versions, and performance metrics, enhancing model management and reproducibility.
- Set up Docker to containerize the application, ensuring consistency in deployment environments.
-
Deployment In EC2 with App Runner on AWS CI/CD:
- Utilized AWS EC2 instances for deployment, providing scalable and customizable infrastructure for hosting the application.
- Employed AWS App Runner to streamline the deployment process, automating application scaling and management.
- Implemented Continuous Integration/Continuous Deployment (CI/CD) pipelines to automate testing, building, and deploying updates to the application on AWS, ensuring a smoother development and deployment workflow.
- Its Production Grade
- Trace all of your expriements
- Logging & tagging your model