Welcome! I'm a data engineer trained via the Springboard Data Engineering Bootcamp with hands-on projects across Azure, SQL, Python, Airflow, and more.
I build robust, scalable data pipelines and solutions that bring order to complex data environments — ready for production and performance.
🚀 Project Timeline
The table below is auto-generated from my SQL Server progress tracker (tblMiniProjectProgress) via a custom Python workflow.
| Project | Description | Repository Link | Last Update |
|---|---|---|---|
| Guided Capstone Project | This guided capstone builds an end-to-end data engineering pipeline for high-frequency equity market data. It designs a relational schema for trade and quote records, ingests daily CSV and JSON files into Spark, and performs batch ETL operations with deduplication and partitioning. The pipeline computes analytical metrics—such as trade indicators, 30-minute moving averages, and bid/ask price movements—and stores results in cloud-based data layers for market trend analysis. | GitHub Repo | 11/11/2025 |
| Unguided Capstone Project | This unguided capstone investigates how the diversity of movie soundtrack genres correlates with audience reception and popularity. Data from The Movie Database (TMDb) and Discogs APIs is integrated to create a unified dataset linking films to their soundtracks. The project uses Python, SQL, and Spark-based ETL pipelines to extract, transform, and analyze relationships between genre variety, release era, and popularity metrics. | GitHub Repo | 11/08/2025 |
| Kafka Mini Project | Built a streaming fraud detection system with Apache Kafka and Python. Deployed a Kafka cluster via Docker Compose, implemented a transaction generator and fraud detector using kafka-python, and routed suspicious transactions to separate topics for real-time monitoring. Demonstrates event streaming, producers, consumers, and containerization. | GitHub Repo | 09/11/2025 |
| Apache Airflow Log Analyzer Mini Project | Built Apache Airflow DAGs to automate Yahoo Finance stock data ingestion, storage, and querying, then extended with a Python log analyzer to monitor execution errors. Demonstrates orchestration, scheduling, operator use, and pipeline monitoring. | GitHub Repo | 08/31/2025 |
| Apache Spark Optimization Mini Project | Optimized PySpark jobs by analyzing query execution plans and rewriting transformations for efficiency. Applied techniques such as reducing shuffles, tuning partitions, selecting efficient operators, and choosing optimal data formats. Demonstrates performance tuning for large-scale Spark ETL workloads using Python and PySpark. | GitHub Repo | 08/08/2025 |
| Apache Spark Post Sales Redesign Mini Project | Redesigned a Hadoop MapReduce post-sales reporting system using Spark. Processed automobile incident data to add make/year attributes and aggregate accidents by vehicle. Implemented RDD transformations, groupByKey, and reduceByKey to generate reports efficiently, highlighting Spark’s performance advantage over MapReduce. | GitHub Repo | 08/05/2025 |
| Azure Synaspe Analytics Mini Project | Built a data pipeline in Azure Synapse Analytics to load product data from Azure Data Lake into a dedicated SQL pool. Implemented data flow with inserts and upserts, handling schema drift and type 1 SCD updates, and orchestrated ingestion using Synapse Studio pipelines. | GitHub Repo | 07/18/2025 |
| Azure DataBricks Mini Project | Implemented a PySpark mini-project in Azure Databricks to ingest, query, and transform datasets. Built solutions using PySpark DataFrame syntax rather than SparkSQL, demonstrating data ingestion, transformations, and query patterns within notebooks submitted as part of the Springboard boot camp. | GitHub Repo | 07/16/2025 |
| MySQL Python Data Pipeline Mini Project | Developed a Python and SQL data pipeline for an event ticketing system. Designed a MySQL table schema, ingested CSV sales data via Python connectors, and implemented queries to analyze ticket popularity and sales trends, showcasing ETL and database integration skills. | GitHub Repo | 07/14/2025 |
| PostgreSQL Tuning Mini Project | Optimized PostgreSQL queries on a computer science publications dataset. Created tables, ingested CSVs, and wrote queries to analyze conferences, authors, and publication trends. Improved performance by designing indexes, refining join/filter logic, and evaluating execution plans with EXPLAIN, demonstrating query tuning and indexing strategies. | GitHub Repo | 03/21/2025 |
| Advanced MySQLQuery Tuning Mini Project | Analyzed EuroCup 2016 data with advanced SQL queries. Imported CSV datasets into MySQL, designed schema with match, player, and referee details, and implemented queries covering match outcomes, penalty shootouts, player stats, bookings, substitutions, and referee activity to explore tournament dynamics. | GitHub Repo | 03/08/2025 |
| Python OOP Mini Project | Implemented a simplified banking system in Python using OOP principles. Modeled customers, accounts, employees, and services such as loans and credit cards. Applied PEP-8 style, logging, and exception handling, with UML-based design and a command-line interface for deposits, withdrawals, and account management. | GitHub Repo | 02/13/2025 |
#SQL #Azure #Airflow #Spark #Kafka #DataPipeline #ETL #DataEngineering #Monitoring #Streaming #Automation
- 📅 35+ weeks of guided, project-based curriculum
- ✏️ 10 mini-projects + 1 guided and 1 unguided capstone
- 🌐 Focus: cloud computing, big data, orchestration, performance optimization
- ✅ Verified by mentor checkpoints and progress metrics
Tools used in real projects: data pipelines, cloud orchestration, SQL optimization, and dashboarding.
📧 Reach me on LinkedIn
🧠 Ask me about boot camp time tracking, SQL optimization, or orchestration frameworks!
Generated automatically via Python on 11-11-2025 18:23:50