I bridge the gap between raw data infrastructure and decision-ready analytics.
Currently completing my BSc in Data Science (Top 10% – GPA 9.0 / 10.0) while building scalable, production-grade data platforms.
I define myself as a Data & Analytics Specialist because I don’t just move data — I model it, validate it, and make it trustworthy.
My work sits at the intersection of Data Engineering, Analytics Engineering, and Business Impact.
- 🔭 Focus: ETL/ELT pipelines, Data Lakes, dbt-based analytics platforms, dimensional modeling
- 💼 Experience: Designed data systems handling millions of records per day, with automated testing, observability, and documentation
- 🌱 Learning: Deepening my knowledge in Apache Airflow, cloud-native data lakes (AWS/GCP), and analytical engines like DuckDB
Focused on modern Data Engineering & Analytics architectures.
| Domain | Tools |
|---|---|
| Languages | |
| Engineering & Cloud | |
| Analytics Engineering | |
| DevOps & CI/CD | |
| Analytics & BI |
A professional analytics engineering platform for financial data.
- The Challenge: A fragmented fintech ecosystem powered by 25+ isolated SQL scripts, with no testing, documentation, or single source of truth.
- The Solution: Migrated the entire transformation layer to a dbt-core architecture using DuckDB, implementing a multi-layer model (Staging → Intermediate → Marts).
- Key Engineering Decisions:
- Implemented SCD Type 2 dimensions to preserve historical accuracy in customer and investment profiles
- Built a comprehensive testing suite (
unique,not_null,accepted_values) to guarantee financial integrity - Created custom dbt macros to modularize logic and accelerate development
- Impact:
- 🧪 80% reduction in production bugs through automated data tests
- ⚡ 50% faster development of new risk and analytics models
- 📚 Reduced analyst onboarding time from 2 weeks to 2 days via dbt Docs
From raw industrial sensor data to BI-ready dimensional models.
- The Challenge: Process millions of industrial IoT readings daily from an external API with reliability, scalability and low latency.
- The Solution: Designed and implemented an end-to-end ETL pipeline orchestrated with Apache Airflow, using PySpark for distributed transformations and AWS S3 as a scalable data lake.
- Architecture: Raw → Processed → Analytics layers, Parquet partitioning & versioning, and a star schema optimized for BI.
- Impact:
- ⚙️ 99.9% pipeline uptime with failure detection in under 5 minutes
- 🚀 ~50 GB/min processed without memory issues
- 💾 Optimized query performance and reduced storage costs via Parquet & partitioning
- 📈 Designed for >1 TB/day scalability
Turning raw access logs into actionable infrastructure insights.
- The Challenge: Production API showing degraded performance with no visibility into root causes.
- The Solution: SQL-based diagnostic pipeline using DuckDB + interactive Looker Studio dashboards.
- Impact: 📉 Identified 9 of 11 endpoints with >20% error rate, pinpointing services causing 35.78% of all 5xx errors.
Solving the “stale data” problem for business stakeholders.
- The Challenge: Sales team spent ~2 hours/day manually merging CSVs.
- The Solution: End-to-end Python ETL with data quality checks and Parquet optimization.
- Impact: 📉 97% reduction in reporting latency (fully automated, daily refresh).


