Skip to content

muhammadhassaan-solves/log-analysis-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Log Analysis Pipeline with ETL, RDS, and Serverless API Integration

Description

In this project, I built an end-to-end data pipeline on AWS by uploading log data to S3, processing it with a Python-based ETL, and storing it in an RDS PostgreSQL database. I created a Lambda function behind an API Gateway (secured with an API key) to provide log statistics (hourly, daily, or weekly). To streamline deployment, I set up CI/CD using GitHub Actions to automatically deploy the Lambda function on every commit.

Utilities Used

  • Python
  • AWS S3
  • AWS RDS (PostgreSQL)
  • AWS Lambda
  • AWS API Gateway
  • boto3, psycopg2, pandas
  • GitHub Actions
  • pgAdmin

Project Walk-through

Step 1: Upload the Dataset to S3

  1. Go to Kaggle Log Analysis Dataset and download the log file.
  2. Login to your AWS Console and open the S3 service.
  3. Create a new bucket and name it.
  4. Open the bucket > Upload > Add files > Choose your downloaded file > Upload

S3 Upload Example

Step 2: Create RDS Database with Schema

  1. Go to AWS Console > RDS > Create database
    • Engine: PostgreSQL
    • Templates: Free Tier
    • DB instance identifier: log-db
    • Username: postgres
    • Password: StrongPassword
    • Enable public access: Yes
    • VPC security group: Allow PostgreSQL port 5432
    • Create database
  2. Go to postgresql.org and download the .exe file for setting up pgAdmin.
  3. In pgAdmin, run a query to create the table.

RDS Example

Step 3: Extract & Load (ETL)

  1. Launch an EC2 and update it.
  2. Install boto3, psycopg2, pandas, using pip after enabling python virtual environment.
  3. Create a script etl_to_rds.py (availble in this repo).
  4. Run the script.

pgAdmin Example

Step 4: Create Lambda Function

  1. Go to AWS Lambda > Create function.
    • Name: your-function-name
    • Runtime: Python 3.12
    • Role: Create new role with basic Lambda permissions
  2. Set environment variables (in Lambda console Configuration → Environment variables).
    • DB_HOST = your-rds-endpoint
    • DB_NAME = your-db-name
    • DB_USER = your-db-username
    • DB_PASS = YourStrongPassword
  3. Create a lambda_function.py file (availble in this repo) and install the required library (psycopg2-binary) in the same folder.
  4. Zip the contents and name it as lambda_deploy.zip.
  5. Go to AWS Lambda Console → Your Function → Code → Upload from → .zip file → choose lambda_deploy.zip → Deploy
  6. In Lambda console Test → Configure test { "queryStringParameters": { "period": "hour" } } → event Run and check output.

Function Example

Step 5: Configure API Gateway with API key

  1. Search “API Gateway” in AWS search bar → click it.
  2. Click Create API → Choose HTTP API → Click Build
  3. Under add integration, choose Lambda and select your function.
  4. Configure routes → Add route → Method: GET → Resource path: /stats
  5. Configure stages → Stage name: default (keep default) → Click Create
  6. Go to API Gateway → Create API Keys
  7. Go to Usage Plans → Create Usage Plans → Add API stage → Choose your API → Choose the stage (default) → Attach the API Key you created

API Gateway  />
</p>
<h3>Step 6: Set Up CI/CD with GitHub Actions</h3>
<ol>
  <li>Create GitHub Repo.</li>
  <li>Push your lambda_function.py to the repo.</li>
  <li>Create GitHub Actions Workflow file (.github/workflows/deploy.yml).</li>
  <li>In your repo, Go to Settings > Secrets and variables > Actions > New repository secret.</li>
  <li> Add these two secrets: AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY</li>
</ol>
<p align= CI/CD with GitHub Actions Example

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages