Welcome to this MLOps project, designed to demonstrate a robust pipeline for managing vehicle insurance data. This project aims to showcase the various tools, techniques, services, and features that go into building and deploying a machine learning pipeline for real-world data management.
- Start by executing the template.py file to create the initial project template, which includes the required folder structure and placeholder files.
- Write the setup for importing local packages in setup.py and pyproject.toml files.
- Tip: Learn more about these files from crashcourse.txt.
- Create a virtual environment and install required dependencies from
requirements.txt:
conda create -p vehicle python=3.10 -y
conda activate vehicle
pip install -r requirements.txt
- Verify the local packages by running:
pip list
- Sign up for MongoDB Atlas and create a new project.
- Set up a free M0 cluster, configure the username and password, and allow access from any IP address (
0.0.0.0/0). - Retrieve the MongoDB connection string for Python and save it (replace
<password>with your password).
- Create a folder named
notebook, add the dataset, and create a notebook filemongoDB_demo.ipynb. - Use the notebook to push data to the MongoDB database.
- Verify the data in MongoDB Atlas under Database > Browse Collections.
- Create logging and exception handling modules. Test them on a demo file
demo.py.
- Analyze and engineer features in the
EDAandFeature Enggnotebook for further processing in the pipeline.
- Define MongoDB connection functions in
configuration.mongo_db_connections.py. - Develop data ingestion components in the
data_accessandcomponents.data_ingestion.pyfiles to fetch and transform data. - Update
entity/config_entity.pyandentity/artifact_entity.pywith relevant ingestion configurations. - Run
demo.pyafter setting up MongoDB connection as an environment variable.
- Set MongoDB URL:
# For Bash
export MONGODB_URL="mongodb+srv://<username>:<password>...."
# For Powershell
$env:MONGODB_URL = "mongodb+srv://<username>:<password>...."
- Note: On Windows, you can also set environment variables through the system settings.
- Define schema in
config.schema.yamland implement data validation functions inutils.main_utils.py.
- Implement data transformation logic in
components.data_transformation.pyand createestimator.pyin theentityfolder.
- Define and implement model training steps in
components.model_trainer.pyusing code fromestimator.py.
-
Log in to the AWS console, create an IAM user, and grant
AdministratorAccess. -
Set AWS credentials as environment variables
# For Bash
export AWS_ACCESS_KEY_ID="YOUR_AWS_ACCESS_KEY_ID"
export AWS_SECRET_ACCESS_KEY="YOUR_AWS_SECRET_ACCESS_KEY"
- Configure S3 Bucket and add access keys in
constants.__init__.py.
- Create an S3 bucket named
my-model-mlopsproj11in theus-east-1region. - Develop code to push/pull models to/from the S3 bucket in
src.aws_storageandentity/s3_estimator.py.
- Implement model evaluation and deployment components.
- Create
Prediction Pipelineand set upapp.pyfor API integration.
- Add
staticandtemplatedirectories for web UI.
-
Create
Dockerfileand.dockerignore. -
Set up GitHub Actions with AWS authentication by creating secrets in GitHub for:
AWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEYAWS_DEFAULT_REGIONECR_REPO
- Set up an EC2 instance for deployment.
- Install Docker on the EC2 machine.
- Connect EC2 as a self-hosted runner on GitHub.
- Open the 5080 port on the EC2 instance.
- Access the deployed app by visiting
http://<public_ip>:5080.
- Crash Course on setup.py and pyproject.toml: See
crashcourse.txtfor details. - GitHub Secrets: Manage secrets for secure CI/CD pipelines.
- Data Ingestion β Data Validation β Data Transformation
- Model Training β Model Evaluation β Model Deployment
- CI/CD Automation with GitHub Actions, Docker, AWS EC2, and ECR
I ran s3_resource.meta.client.upload_file(PATH_IN_COMPUTER, BUCKET_NAME, KEY) The code ran without errors but the file did not get uploaded.
Run the following command:
aws sts get-caller-identityIf credentials are correct, you should see output like:
{
"UserId": "ABC123XYZ456",
"Account": "123456789012",
"Arn": "arn:aws:iam::123456789012:user/your-user"
}- β If this works β Your credentials are fine; move to Step 4.
- β If you get an error (e.g., "InvalidAccessKeyId") β Move to Step 2.
Unset the incorrectly set environment variables:
set AWS_ACCESS_KEY_ID=
set AWS_SECRET_ACCESS_KEY=Then, configure AWS CLI properly using:
aws configureRun:
aws s3 ls- β If this lists S3 buckets β Your credentials work. Try running your Python script again.
- β If the error persists β Double-check your access keys in the AWS Console (IAM β Users β Security Credentials).
While running aws sts get-caller-identity if you're getting 'aws' is not recognized as an internal or external command, operable program or batch file.
Run this command to check if AWS CLI is installed:
where aws- β
If it outputs a path like
C:\Program Files\Amazon\AWSCLI\bin\aws.exeβ Move to Step 3. - β If it says
INFO: Could not find files for the given pattern(s).β Move to Step 2.
If AWS CLI is not installed, download and install it:
1. Download AWS CLI
-
Go to: AWS CLI Installe
-
Run the installer (
AWSCLIV2.msi) and follow the setup steps
Verify Installation
After installation, close and reopen CMD, then check:
aws --version- β
If it shows something like
aws-cli/2.x.xβ Installation is successful.
Then, try:
aws sts get-caller-identityIf the previous step worked, configure AWS credentials:
aws configureEnter:
- AWS Access Key ID
- AWS Secret Access Key
- Region Name (
e.g., us-east-1) - Output format (
json or text)
Then, test S3 access:
aws s3 ls
Great! Now that AWS CLI is working, let's verify everything step by step before running your Python script again.
Run:
aws sts get-caller-identityExpected Output:
{
"UserId": "ABC123XYZ456",
"Account": "123456789012",
"Arn": "arn:aws:iam::123456789012:user/your-user"
}- β If this works β Your credentials are valid.
- β If it still says
InvalidAccessKeyIdβ Check IAM permissions & reconfigure AWS (aws configure).
Run:
aws s3 lsExpected Output:
A list of your S3 buckets, e.g.,
2025-02-03 my-model-mlopsproj- β If your bucket appears β Move to Step 3.
- β If no bucket appears or access is denied β You might not have S3 permissions in IAM.
Before running your Python script, test uploading a sample file manually:
echo "Hello MLOps" > test_file.txt
aws s3 cp test_file.txt s3://my-model-mlopsproj/If this works, check if the file is in S3:
aws s3 ls s3://my-model-mlopsproj/- β If upload works β AWS setup is fine. Move to Step 4.
- β If upload fails β You might not have s3:PutObject permissions.
Now, rerun your Python script:
python your_script.py