π― Currently: Building scalable ETL pipelines and ML solutions as a Data & AI Engineer
πΌ Serving 15+ international clients across US, UK, and Europe
π Specialized in Snowflake architectures, real-time analytics, and production ML deployment
π Location: Karachi, Pakistan π΅π° | Open to: Remote opportunities globally π
π Education: BS Computer Science @ Sukkur IBA University | Bronze Medalist π₯ | GPA: 3.55/4.0
π¬ Ask me about: ETL Pipelines, Snowflake, AWS, ML Deployment, Data Modeling, LLM Fine-tuning
I'm a Data & AI Engineer who builds scalable data platforms and intelligent systems. My expertise spans:
- π ETL Pipeline Development - Converting complex processes into robust, scalable pipelines
- ποΈ Data Warehousing - Snowflake-style architectures, dimensional modeling, BigQuery
- π Real-Time Analytics - SQL optimization, Looker Studio dashboards, BI solutions
- π€ ML Deployment - AWS SageMaker, model training, and production deployment
- π§ LLM & RLHF - Fine-tuning language models, prompt engineering, RAG systems
- π§ Data Quality - Automated validation, normalization, and pipeline monitoring
Current Mission: Delivering production-ready data solutions that drive measurable business impact across sports, enterprise, and analytics domains.
- Optimized SQL queries to accelerate data retrieval for real-time analytics
- Designed scalable data models, enhancing data integrity and system performance
- Automated robust ETL pipelines to process unstructured data, streamlining entire data flow
- Designed and maintained efficient data structures in Snowflake for seamless data processing
- Performed rigorous QA on data pipelines to guarantee accuracy, reliability, and optimal performance
- Created Looker Studio visualizations and implemented SQL/Python solutions to enhance data processes
- Serving 15+ international clients across US, UK, and Europe
- Converted 132 Lambda functions into 10 robust ETL pipelines, significantly enhancing performance and scalability
- Developed a Django-based system for sports data ingestion and modeling, paired with robust APIs
- Provided high-speed, transparent, and reliable data platform support for multiple sports
- Reviewed and refined LLM responses to improve accuracy and performance (RLHF)
- Solved complex DSA and ML problems under specific deadlines
- Ensured high-quality task deliverables through rigorous review and adherence to standards
End-to-End Data Platform
- Ingested and processed unstructured survey data with resolving major data quality issues
- Automated data transformation in Python to split multi-response rows and create unique hash-based primary keys
- Designed a Snowflake-style normalized data model in BigQuery for scalable analytics
- Setup Looker dashboards on top of BQ to deliver real-time insights and eliminate manual reporting
- Tech Stack: Python, BigQuery, Looker Studio, SQL
- Impact: Eliminated manual reporting and enabled real-time decision making
Production-Grade ML System
- Cleaned patient records using Winsorization and IQR methods for outliers and improved data quality
- Performed EDA and DBSCAN to extract key features indicating diabetes risk and medication patterns
- Trained several ML models and finalized XGBoost for its superior accuracy and reliability
- Deployed on AWS SageMaker and integrated with a Django app for real-time risk prediction
- Tech Stack: Python, XGBoost, AWS SageMaker, Django, Scikit-learn
- Impact: Real-time diabetes risk assessment for healthcare providers
AI-Powered Analytics System
- Re-engineered a single-prompt prediction system for football analytics into a multi-chain prompt architecture
- Designed structured dedicated prompt chains, improving accuracy, control, and reducing hallucinations
- Integrated LangFuse for end-to-end prompt observability and performance tracking
- Tech Stack: Python, LangChain, LangFuse, OpenAI API
- Impact: Enhanced prediction accuracy and reduced model hallucinations significantly
|
Specialized Skills:
|
AWS Services:
|
Bachelor of Science in Computer Science
- π₯ Bronze Medalist | GPA: 3.55/4.0
- Relevant Coursework: Artificial Intelligence, Machine Learning, OOP, Data Structures & Algorithms, Parallel & Distributed Computing, Digital Image Processing, Database Management, Linear Algebra, Statistics








