Skip to content
View MarShaikh's full-sized avatar
πŸ’­
πŸ’­

Block or report MarShaikh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
MarShaikh/README.md

Hi there πŸ‘‹

I am a Research Software Engineer with a passion for building scalable machine learning systems and developing robust software tools for data-intensive applications.

πŸ’Ό Professional Interests

  • Applied ML: Architecting and deploying machine learning models for complex challenges, including spatio-temporal forecasting and large-scale sequence analysis.
  • Data Engineering & Geospatial: Building cloud-native data platforms, like STAC APIs, to efficiently manage, process, and serve large-scale datasets.
  • ML & AI: Advancing skills in modern machine learning, including statistical modeling, Conformal Prediction for reliable uncertainty quantification, and efficient fine-tuning methods (e.g., LoRA) for large transformer models.
  • Cloud & MLOps: Designing and automating CI/CD pipelines for model deployment, data updates, and infrastructure management using tools like Terraform and GitHub Actions.

πŸ’» Skills

  • Languages & Libraries: Python, R, PyTorch, TensorFlow, scikit-learn, Pandas, NumPy, Hugging Face
  • ML & AI: Supervised & Unsupervised Learning, Deep Learning, Generative AI (LLMs), Statistical Modeling, Conformal Prediction
  • Cloud & MLOps: Azure, Google Cloud Platform (GCP), AWS, Docker, Terraform, CI/CD, GitHub Actions, Git
  • Data Engineering & Geospatial: SQL, PostgreSQL, ETL, Data Pipelines, STAC API

πŸ”­ Current Projects

  • Real-time Spatio-Temporal Forecasting System

    • Pioneered the application of Conformal Prediction to generate reliable 95% uncertainty intervals for time-series forecasts.
    • Developed advanced statistical modeling approaches for zero-inflated count data, improving prediction accuracy by 20%.
    • Architected and deployed a production-ready API on Azure (using RestRServe, Docker, and Terraform) for real-time data surveillance, reducing analysis time by 40%.
    • Implemented a cloud-native STAC API to ingest and manage large-scale geospatial datasets (e.g., CHIRPS, MODIS), ensuring high data integrity.
    • Established a CI/CD pipeline with GitHub Actions to automate monthly data updates, accelerating run times by up to 95%.
  • Efficient Transformer Model Adaptation

    • Implemented and optimized parameter-efficient fine-tuning (PEFT) methods like LoRA for large transformer architectures, reducing computational resource needs by 60% while retaining 95% of full fine-tuning performance.
    • Engineered custom data preprocessing pipelines for complex, large-scale sequence data, enabling the analysis of inputs 40% larger than was possible with standard model limitations.

🌱 Ongoing Learning

  • Deepening my understanding of advanced statistical models for complex, high-dimensional data.
  • Exploring and implementing MLOps strategies to enhance the reproducibility, scalability, and monitoring of machine learning workflows.
  • Researching novel approaches for applying large language models to structured and unstructured data extraction and analysis tasks.

πŸ“« How to Reach Me

Pinned Loading

  1. Function to find and kill process on... Function to find and kill process on a given port
    1
    #!/bin/bash
    2
    
                  
    3
    # Function to find and kill process on a given port
    4
    kill_process_on_port() {
    5
        local port=$1
  2. ivy-llc/ivy ivy-llc/ivy Public

    Convert Machine Learning Code Between Frameworks

    Python 14.2k 5.6k

  3. r-spatial/rgee r-spatial/rgee Public

    Google Earth Engine for R

    R 747 158