Skip to content
View wyang10's full-sized avatar

Highlights

  • Pro

Block or report wyang10

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
wyang10/README.md

Hi there πŸ‘‹ I’m Audrey~ πŸš€

About Me 🌱

I'm a Cloud Data Engineer building scalable, reliable, and cost-efficient cloud data platforms.
I specialize in turning raw, messy, multi-source data into trusted analytics layers and ML-ready pipelines
through a mix of modern ELT, streaming systems, and strong distributed systems fundamentals.


Quick Pitch πŸ’¬

πŸŽ“ MSCS @ Northeastern University (2022–2024)
☁️ Focus: Cloud-Native Data Engineering
πŸ”— Connect: GitHub: wyang10 β€’ LinkedIn: linkedin.com/in/awhy


Highlights πŸ’‘

  • Focused on building cloud-native, event-driven data systems on AWS / GCP cloud platform.
  • Experienced delivering data platforms and analytics pipelines with data quality and schema governance.
  • Strong in reliability engineering (idempotency, DLQ/replay, observability), IaC (Terraform), Kubernetes, and CI/CD.

Experience 🧩

Data Engineer β€” LumiereX (Jan 2025 – Present)

  • Built event-driven Serverless ELT ingestion on AWS(S3, API Gateway, Lambda, SQS, Glue, Step Functions).
  • Improved data quality layers, and optimized Spark jobs for cost/performance.
  • Inplemented in reliability engineering (idempotency, DLQ/replay, observability).

Software Engineer Intern β€” VisionX (Jan 2024 – Jul 2024)

  • Contributed to a Kafka β†’ Flink streaming pipeline to enable real-time ML scoring for IoT sensory.
  • Focused on modules including schema governance, ingestion reliability, and validation checks.
  • Containerized Flink jobs with Docker, deployed to Kubernetes.

Featured Projects πŸ‘¨β€πŸ’»

  • Orchestration: EventBridge β†’ Step Functions β†’ Glue Job + optional Great Expectations gate.
  • Catalog / Query: Glue Data Catalog + Crawler + Athena tables for silver/ Parquet.
  • Replay / Recovery: replay & dlq-redrive scripts for backfill and poison-message recovery.
  • Idempotency: DynamoDB TTL for object-level dedup, optional GSI for audit.
  • CI/CD: GitHub Actions pipelines (Lambda build+deploy, Terraform plan+apply).
  • End-to-End, Reproducible ML Pipeline Engineered a modular, production-style ML system for predicting in-hospital mortality.
  • Go from raw CSV β†’ cleaned features β†’ baseline models β†’ reproducible CLI pipeline, with optional SMOTE to address severe class imbalance.
  • A production-ready ELT & Data Quality Framework using Airflow + dbt + Snowflake + Great Expectations + CICD.
  • Automates data ingestion, transformation, testing, and lineage into a reproducible orchestration system.

How I Work πŸ‘―

  • I design modular, observable pipelines that are easy to test, debug, and scale.
  • I prioritize trade-offs that maximize team velocity, reliability, and cloud spend efficiency.
  • I enjoy collaborations involving data modeling, pipeline quality, and distributed system design.

Core Skills ⚑

Languages & Tools
Python (Pandas, PySpark) β€’ SQL β€’ Java β€’ Bash

Cloud & Orchestration
GCP (BigQuery, Dataflow) β€’ AWS (S3, EMR, Glue, Lambda, SQS, Step functions, IAM)
GitHub Actions β€’ Airflow β€’ dbt β€’ Docker β€’ Kubernetes β€’ Terraform

Big Data & Storage
Spark β€’ Kafka β€’ Flink β€’ Databricks β€’ Delta Lake
Snowflake β€’ Parquet β€’ SCD Type2 β€’ dimensional modeling

Data Quality & CI/CD
Great Expectations β€’ dbt tests β€’ automated lineage β€’ monitoring


πŸ˜„ Thanks for stopping by! πŸ‘‹

Pinned Loading

  1. AWS-Serverless-ELT-Pipeline-Enterprise AWS-Serverless-ELT-Pipeline-Enterprise Public

    Enterprise track: Step Functions/EventBridge + Glue + data quality on top of the v1 serverless ELT

    Python 1

  2. Smote-Heart-Attack-ML Smote-Heart-Attack-ML Public

    A Modular, Production-Style ML Pipeline with Class-Imbalance Handling

    Jupyter Notebook 1

  3. airflow_dbt_demo airflow_dbt_demo Public

    Airflow + dbt + Snowflake + Postgres + Docker + CICD (Postgres‑backed)

    HTML 1

  4. Openai-DBAuctionSystem Openai-DBAuctionSystem Public

    DBAuctionSystem β€” Furniture Auction Platform (Django + MySQL + Streamlit)

    JavaScript 1

  5. AI-Photo-Generator AI-Photo-Generator Public

    End-to-end system for generating compliant ID photos from user uploads, featuring a production-style workflow from raw images β†’ segmentation/matting β†’ face-aligned cropping β†’ background synthesis →…

    JavaScript 1

  6. Android_WeatherFinder Android_WeatherFinder Public

    A simple Android app to search weather by city name and display real‑time weather info (city, country, description, temperature

    Java 1