Skip to content
View sofaquitegud's full-sized avatar
🏠
Working from home
🏠
Working from home

Block or report sofaquitegud

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
sofaquitegud/README.md

☕️ Data Infrastructure & Analytics | Muhammad Syafiq Farhan

Bachelor of Computer Science (Honours) in Data Science & Computational Intelligence

High-impact Data Professional specializing in building end-to-end ELT pipelines and distributed streaming architectures. I focus on turning complex, unstructured data into production-ready insights through precision engineering and automated orchestration.


🟢 Status Report

  • 💼 Current Role: Analytics Engineer at OR Technologies.
  • 🚀 Performance: Achieved a 70% reduction in manual tax reporting overhead via automation.
  • Dependencies: Python, SQL, Docker, and a double-shot of Espresso.
  • 🛠 System Admin: Executed end-to-end migrations and version upgrades for Alteryx and Tableau Server environments.

🛠 Technical Loadout

Category Tools
Languages Python (Pandas, NumPy), SQL (PostgreSQL, BigQuery)
Orchestration Apache Airflow, Alteryx Designer, Alteryx Server
Visualization Tableau, Metabase, Streamlit, Power BI
Compute PySpark, Apache Spark
Streaming Apache Kafka, Zookeeper
Transformation dbt Core, Great Expectations
Infrastructure Docker, Docker Compose, MinIO, Git

🏆 Featured Projects

The Build: Production-grade Medallion Architecture (Raw -> Processed -> Analytics).

  • The W: Orchestrated 5+ Airflow DAGs for automated backfills and daily ingestion.
  • Reliability: Built a Star Schema in PostgreSQL with automated quality gating.

The Build: Fault-tolerant streaming pipeline for market volatility tracking with sub-second latency.

  • The W: Hybrid processing layer using Kafka for real-time alerts and Spark for batch aggregations.

The Build: Spatial-temporal analysis of 560,000+ player tracking records.

  • The W: Leveraged vectorized Python to quantify player efficiency through complex spatial feature engineering.

The Build: Automated multi-jurisdiction tax engine providing 24/7 reliability for enterprise compliance.

  • The W: 70% manual effort reduction; engineered complex rule-based workflows in Alteryx Designer.

☕️ Connect

Fuelled by double-shot espressos and optimized schemas. Let's talk data.


"Scalable systems aren't built on luck—they're built on logic and caffeine." ☕️

Pinned Loading

  1. taxi-data-pipeline taxi-data-pipeline Public

    End-to-end data engineering pipeline for NYC taxi data using Airflow, PySpark, dbt, and MinIO.

    Python

  2. pulse-stream pulse-stream Public

    Real-time streaming analytics pipeline for cryptocurrency prices using Kafka, PostgreSQL, PySpark, and Streamlit

    Python

  3. sales-use-tax-analytics sales-use-tax-analytics Public

    Automated sales and use tax calculation workflow using Alteryx, with interactive Tableau dashboards for reporting and insights.

  4. NFL_Big_Data_Bowl_2026 NFL_Big_Data_Bowl_2026 Public

    Analyzed weekly NFL player-tracking data using Python (Pandas, NumPy, Matplotlib). Combined multiple weeks of input and output files into a unified dataset to explore player movement, speed, and po…

    Jupyter Notebook 1

  5. sofaquitegud.github.io sofaquitegud.github.io Public

    My portfolio website

    CSS

  6. YOLO_OBJECT_DETECTION YOLO_OBJECT_DETECTION Public

    Object detection project using YOLOv8s for live and spoof data

    Jupyter Notebook