Skip to content
View DOCUTEE's full-sized avatar

Highlights

  • Pro

Block or report DOCUTEE

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
DOCUTEE/README.md

Nguyen Minh Quang

Hello! I’m Nguyen Minh Quang, a Data Engineer passionate about building scalable data pipelines and delivering actionable insights from large datasets.


GitHub Stats

GitHub Stats
Top Languages


Education

HCM University of Technology and Education (HCMUTE)
Bachelor of Engineering in Data Engineering (2022 – 2026 expected)
GPA: 2.93 / 4


Achievements

  • 3rd Prize – 33rd Vietnam Student Olympiad in Informatics (OLP’24)
  • Honorable Mention – ICPC Asia Pacific Programming Contest 2022 & 2023
  • Consolation Prize – ICPC 2022 Vietnam National Contest (Non-IT)
  • 3rd Prize – Central & Central Highlands Olympiad of Informatics (2021)
  • Solved 900+ problems on Codeforces using C++
  • LeetCode SQL 50 Badge – Completed 50 essential SQL problems (basic → intermediate)

Skills

  • Languages: C++, SQL, Python, Java
  • Databases: SQL Server, MySQL, MongoDB
  • Big Data: Hadoop, Spark (PySpark), Hive, Airflow, dbt, Kafka
  • Visualization: Power BI, Superset
  • Cloud & Storage: AWS (Athena, Glue, S3, EC2, Cloud9, EMR), MinIO (S3-compatible)
  • Dev Tools: Docker, Git
  • OS: Linux (Ubuntu)
  • Languages: English (fluent reading; strong writing & communication)
  • AI/ML Tools: ChatGPT, Copilot, Gemini, Claude, Grok, Deepseek

Projects

SteamMind – Steam Reviews Analysis Platform

Role: Data Engineer | Team of 2

  • Built a lakehouse platform ingesting, transforming, and analyzing 100M+ Steam reviews.
  • Technologies: Python, Spark, dbt, Hive Metastore, HDFS, Iceberg, Airflow, Trino, Superset, Docker
  • Responsibilities included partitioned batch processing, schema evolution, data cleaning, and orchestration with Airflow.
  • Source

CFBIGDATA – Codeforces Submission Analytics

Role: Data Engineer | Team of 5

  • Real-time ETL pipeline for live contest data and historical trend analysis.
  • Technologies: Python, Spark, Kafka, Delta Lake, MinIO, Grafana, Superset, Airflow, Docker
  • Designed ELT workflows, validated streaming JSON, and powered sub-minute dashboards via ClickHouse.
  • Source

MacScan – MacBook Pro 14” M1 Price Analysis

Role: Data Engineer | Solo

  • Scraped and analyzed used MacBook Pro listings to surface pricing and configuration trends.
  • Technologies: Python, Selenium, BeautifulSoup, Matplotlib, Seaborn
  • Automated scraping of 50 pages, feature engineering, and data visualization.
  • Source

Certifications

  • AWS Academy Graduate – Cloud Developing
  • AWS Academy Graduate – Cloud Web Application Builder
  • AWS Academy Graduate – Cloud Foundations

Extracurricular Activities

Student Mentor & Instructor, HCMUTE

  • Mentored 100+ students through 8 programming sessions (2024–2025).
  • Developed lessons on data structures, algorithms, and problem-solving strategies.

Algorithm Team Leader, Code Mely

  • Led a team of 10 to organize beginner contests with 150+ participants.
  • Supported Codeforces Rounds 963 & 983 (30K+ participants each).
  • Built test generators and checkers in C++; contributed problems to Hackerrank.

Contact

Email: quangforwork1203@gmail.com
Phone: +84 935 601 729
GitHub: DOCUTEE
LinkedIn: quang-data

Pinned Loading

  1. CFBIGDATA CFBIGDATA Public

    Built a basic data pipeline to automatically collect, transform and store real-time Codeforces contest data. The system performs scheduled ETL to provide live contest insights and analyze long-term…

    Python 1 2

  2. HaMu HaMu Public

    🚀 A tool for quickly deploying a fully containerized pseudo-distributed Hadoop cluster, making Hadoop setup faster and easier.

    Shell 8 2

  3. steam_analysis steam_analysis Public

    Developed a data platform to collect and analyze Steam game reviews, supporting review clustering and game growth prediction. Designed end-to-end workflows for data ingestion, transformation, and s…

    Python

  4. automation_email_replying automation_email_replying Public

    Automated email replies using SMTP/IMAP and a local LLM, running in a Dockerized environment.

    Python