I am a Research Software Engineer with a passion for building scalable machine learning systems and developing robust software tools for data-intensive applications.
- Applied ML: Architecting and deploying machine learning models for complex challenges, including spatio-temporal forecasting and large-scale sequence analysis.
- Data Engineering & Geospatial: Building cloud-native data platforms, like STAC APIs, to efficiently manage, process, and serve large-scale datasets.
- ML & AI: Advancing skills in modern machine learning, including statistical modeling, Conformal Prediction for reliable uncertainty quantification, and efficient fine-tuning methods (e.g., LoRA) for large transformer models.
- Cloud & MLOps: Designing and automating CI/CD pipelines for model deployment, data updates, and infrastructure management using tools like Terraform and GitHub Actions.
- Languages & Libraries: Python, R, PyTorch, TensorFlow, scikit-learn, Pandas, NumPy, Hugging Face
- ML & AI: Supervised & Unsupervised Learning, Deep Learning, Generative AI (LLMs), Statistical Modeling, Conformal Prediction
- Cloud & MLOps: Azure, Google Cloud Platform (GCP), AWS, Docker, Terraform, CI/CD, GitHub Actions, Git
- Data Engineering & Geospatial: SQL, PostgreSQL, ETL, Data Pipelines, STAC API
-
Real-time Spatio-Temporal Forecasting System
- Pioneered the application of Conformal Prediction to generate reliable 95% uncertainty intervals for time-series forecasts.
- Developed advanced statistical modeling approaches for zero-inflated count data, improving prediction accuracy by 20%.
- Architected and deployed a production-ready API on Azure (using RestRServe, Docker, and Terraform) for real-time data surveillance, reducing analysis time by 40%.
- Implemented a cloud-native STAC API to ingest and manage large-scale geospatial datasets (e.g., CHIRPS, MODIS), ensuring high data integrity.
- Established a CI/CD pipeline with GitHub Actions to automate monthly data updates, accelerating run times by up to 95%.
-
Efficient Transformer Model Adaptation
- Implemented and optimized parameter-efficient fine-tuning (PEFT) methods like LoRA for large transformer architectures, reducing computational resource needs by 60% while retaining 95% of full fine-tuning performance.
- Engineered custom data preprocessing pipelines for complex, large-scale sequence data, enabling the analysis of inputs 40% larger than was possible with standard model limitations.
- Deepening my understanding of advanced statistical models for complex, high-dimensional data.
- Exploring and implementing MLOps strategies to enhance the reproducibility, scalability, and monitoring of machine learning workflows.
- Researching novel approaches for applying large language models to structured and unstructured data extraction and analysis tasks.
- GitHub
- Email: maruf3141@outlook.com



