GitHub - mtholahan/springboard-projects: A meta-repo for my Springboard data engineering boot camp projects.

📊 Springboard Data Engineering Portfolio

Welcome! I'm a data engineer trained via the Springboard Data Engineering Bootcamp with hands-on projects across Azure, SQL, Python, Airflow, and more.

I build robust, scalable data pipelines and solutions that bring order to complex data environments — ready for production and performance.

🚀 Project Timeline

The table below is auto-generated from my SQL Server progress tracker (tblMiniProjectProgress) via a custom Python workflow.

Project	Description	Repository Link	Last Update
Guided Capstone Project	This guided capstone builds an end-to-end data engineering pipeline for high-frequency equity market data. It designs a relational schema for trade and quote records, ingests daily CSV and JSON files into Spark, and performs batch ETL operations with deduplication and partitioning. The pipeline computes analytical metrics—such as trade indicators, 30-minute moving averages, and bid/ask price movements—and stores results in cloud-based data layers for market trend analysis.	GitHub Repo	11/11/2025
Unguided Capstone Project	This unguided capstone investigates how the diversity of movie soundtrack genres correlates with audience reception and popularity. Data from The Movie Database (TMDb) and Discogs APIs is integrated to create a unified dataset linking films to their soundtracks. The project uses Python, SQL, and Spark-based ETL pipelines to extract, transform, and analyze relationships between genre variety, release era, and popularity metrics.	GitHub Repo	11/08/2025
Kafka Mini Project	Built a streaming fraud detection system with Apache Kafka and Python. Deployed a Kafka cluster via Docker Compose, implemented a transaction generator and fraud detector using kafka-python, and routed suspicious transactions to separate topics for real-time monitoring. Demonstrates event streaming, producers, consumers, and containerization.	GitHub Repo	09/11/2025
Apache Airflow Log Analyzer Mini Project	Built Apache Airflow DAGs to automate Yahoo Finance stock data ingestion, storage, and querying, then extended with a Python log analyzer to monitor execution errors. Demonstrates orchestration, scheduling, operator use, and pipeline monitoring.	GitHub Repo	08/31/2025
Apache Spark Optimization Mini Project	Optimized PySpark jobs by analyzing query execution plans and rewriting transformations for efficiency. Applied techniques such as reducing shuffles, tuning partitions, selecting efficient operators, and choosing optimal data formats. Demonstrates performance tuning for large-scale Spark ETL workloads using Python and PySpark.	GitHub Repo	08/08/2025
Apache Spark Post Sales Redesign Mini Project	Redesigned a Hadoop MapReduce post-sales reporting system using Spark. Processed automobile incident data to add make/year attributes and aggregate accidents by vehicle. Implemented RDD transformations, groupByKey, and reduceByKey to generate reports efficiently, highlighting Spark’s performance advantage over MapReduce.	GitHub Repo	08/05/2025
Azure Synaspe Analytics Mini Project	Built a data pipeline in Azure Synapse Analytics to load product data from Azure Data Lake into a dedicated SQL pool. Implemented data flow with inserts and upserts, handling schema drift and type 1 SCD updates, and orchestrated ingestion using Synapse Studio pipelines.	GitHub Repo	07/18/2025
Azure DataBricks Mini Project	Implemented a PySpark mini-project in Azure Databricks to ingest, query, and transform datasets. Built solutions using PySpark DataFrame syntax rather than SparkSQL, demonstrating data ingestion, transformations, and query patterns within notebooks submitted as part of the Springboard boot camp.	GitHub Repo	07/16/2025
MySQL Python Data Pipeline Mini Project	Developed a Python and SQL data pipeline for an event ticketing system. Designed a MySQL table schema, ingested CSV sales data via Python connectors, and implemented queries to analyze ticket popularity and sales trends, showcasing ETL and database integration skills.	GitHub Repo	07/14/2025
PostgreSQL Tuning Mini Project	Optimized PostgreSQL queries on a computer science publications dataset. Created tables, ingested CSVs, and wrote queries to analyze conferences, authors, and publication trends. Improved performance by designing indexes, refining join/filter logic, and evaluating execution plans with EXPLAIN, demonstrating query tuning and indexing strategies.	GitHub Repo	03/21/2025
Advanced MySQLQuery Tuning Mini Project	Analyzed EuroCup 2016 data with advanced SQL queries. Imported CSV datasets into MySQL, designed schema with match, player, and referee details, and implemented queries covering match outcomes, penalty shootouts, player stats, bookings, substitutions, and referee activity to explore tournament dynamics.	GitHub Repo	03/08/2025
Python OOP Mini Project	Implemented a simplified banking system in Python using OOP principles. Modeled customers, accounts, employees, and services such as loans and credit cards. Applied PEP-8 style, logging, and exception handling, with UML-based design and a command-line interface for deposits, withdrawals, and account management.	GitHub Repo	02/13/2025

🏷️ Tags

#SQL #Azure #Airflow #Spark #Kafka #DataPipeline #ETL #DataEngineering #Monitoring #Streaming #Automation

📚 Bootcamp Summary

📅 35+ weeks of guided, project-based curriculum
✏️ 10 mini-projects + 1 guided and 1 unguided capstone
🌐 Focus: cloud computing, big data, orchestration, performance optimization
✅ Verified by mentor checkpoints and progress metrics

🛠️ Skills & Tools

🧰 Core Stack

🛠️ Supporting Tools

Additional Tags (click to expand)

Tools used in real projects: data pipelines, cloud orchestration, SQL optimization, and dashboarding.

📬 Let’s Connect

📧 Reach me on LinkedIn
🧠 Ask me about boot camp time tracking, SQL optimization, or orchestration frameworks!

Generated automatically via Python on 11-11-2025 18:23:50

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
.gitattributes		.gitattributes
README.md		README.md
git-finalize.sh		git-finalize.sh
tool_usage_changelog.md		tool_usage_changelog.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📊 Springboard Data Engineering Portfolio

🏷️ Tags

📚 Bootcamp Summary

🛠️ Skills & Tools

🧰 Core Stack

🛠️ Supporting Tools

📬 Let’s Connect

About

Uh oh!

Releases

Packages

Languages

mtholahan/springboard-projects

Folders and files

Latest commit

History

Repository files navigation

📊 Springboard Data Engineering Portfolio

🏷️ Tags

📚 Bootcamp Summary

🛠️ Skills & Tools

🧰 Core Stack

🛠️ Supporting Tools

📬 Let’s Connect

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages