Skip to content
View Gerardo1909's full-sized avatar
🧭
Looking for opportunities
🧭
Looking for opportunities

Highlights

  • Pro

Block or report Gerardo1909

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Gerardo1909/README.md

Hi there, I'm Gerardo Toboso 👋

Data & Analytics Specialist | Data Engineering & Science

LinkedIn Email Portfolio



I bridge the gap between raw data infrastructure and decision-ready analytics.
Currently completing my BSc in Data Science (Top 10% – GPA 9.0 / 10.0) while building scalable, production-grade data platforms.


🚀 About Me

I define myself as a Data & Analytics Specialist because I don’t just move data — I model it, validate it, and make it trustworthy.
My work sits at the intersection of Data Engineering, Analytics Engineering, and Business Impact.

  • 🔭 Focus: ETL/ELT pipelines, Data Lakes, dbt-based analytics platforms, dimensional modeling
  • 💼 Experience: Designed data systems handling millions of records per day, with automated testing, observability, and documentation
  • 🌱 Learning: Deepening my knowledge in Apache Airflow, cloud-native data lakes (AWS/GCP), and analytical engines like DuckDB

🛠️ Tech Stack & Tools

Focused on modern Data Engineering & Analytics architectures.

Domain Tools
Languages Python SQL
Engineering & Cloud PySpark Airflow AWS%20S3 DuckDB
Analytics Engineering dbt
DevOps & CI/CD Docker Git GitHub Actions
Analytics & BI Pandas PowerBI

🏆 Featured Projects

A professional analytics engineering platform for financial data.

  • The Challenge: A fragmented fintech ecosystem powered by 25+ isolated SQL scripts, with no testing, documentation, or single source of truth.
  • The Solution: Migrated the entire transformation layer to a dbt-core architecture using DuckDB, implementing a multi-layer model (Staging → Intermediate → Marts).
  • Key Engineering Decisions:
    • Implemented SCD Type 2 dimensions to preserve historical accuracy in customer and investment profiles
    • Built a comprehensive testing suite (unique, not_null, accepted_values) to guarantee financial integrity
    • Created custom dbt macros to modularize logic and accelerate development
  • Impact:
    • 🧪 80% reduction in production bugs through automated data tests
    • 50% faster development of new risk and analytics models
    • 📚 Reduced analyst onboarding time from 2 weeks to 2 days via dbt Docs

From raw industrial sensor data to BI-ready dimensional models.

  • The Challenge: Process millions of industrial IoT readings daily from an external API with reliability, scalability and low latency.
  • The Solution: Designed and implemented an end-to-end ETL pipeline orchestrated with Apache Airflow, using PySpark for distributed transformations and AWS S3 as a scalable data lake.
  • Architecture: Raw → Processed → Analytics layers, Parquet partitioning & versioning, and a star schema optimized for BI.
  • Impact:
    • ⚙️ 99.9% pipeline uptime with failure detection in under 5 minutes
    • 🚀 ~50 GB/min processed without memory issues
    • 💾 Optimized query performance and reduced storage costs via Parquet & partitioning
    • 📈 Designed for >1 TB/day scalability

Turning raw access logs into actionable infrastructure insights.

  • The Challenge: Production API showing degraded performance with no visibility into root causes.
  • The Solution: SQL-based diagnostic pipeline using DuckDB + interactive Looker Studio dashboards.
  • Impact: 📉 Identified 9 of 11 endpoints with >20% error rate, pinpointing services causing 35.78% of all 5xx errors.

Solving the “stale data” problem for business stakeholders.

  • The Challenge: Sales team spent ~2 hours/day manually merging CSVs.
  • The Solution: End-to-end Python ETL with data quality checks and Parquet optimization.
  • Impact: 📉 97% reduction in reporting latency (fully automated, daily refresh).

📈 GitHub Stats

Gerardo's Stats Top Languages

Pinned Loading

  1. iot-etl-pipeline iot-etl-pipeline Public

    Pipeline batch ETL con PySpark y Apache Airflow para procesar millones de lecturas de sensores IoT, transformándolas en un Star Schema optimizado para analytics de manufactura inteligente.

    Python

  2. fintech-flow-dbt fintech-flow-dbt Public

    Pipeline de Analytics Engineering para un ecosistema Fintech. Basado en dbt y DuckDB para transformar datos bancarios crudos en marts analíticos confiables, testeados y documentados.

  3. server-logs-sql-analysis server-logs-sql-analysis Public

    Análisis completo de logs generados por un servidor web para detectar patrones en endpoints utilizados y sugerir áreas de mejora para el equipo de desarrollo.

    Jupyter Notebook

  4. ecommerce-reporting-etl ecommerce-reporting-etl Public

    Pipeline automatizado de ETL (Extract, Transform, Load) diseñado para procesar y analizar datos transaccionales de e-commerce, generando métricas de negocio críticas para la toma de decisiones estr…

    Python