Skip to content
View san-deep-reddy's full-sized avatar

Block or report san-deep-reddy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
san-deep-reddy/README.md
Profile Views

Hi, I'm Sandeep Reddy πŸ‘‹

AI Engineer | ML Infrastructure Specialist | Production AI Systems

Chicago, IL β€’ 4+ Years Experience β€’ End-to-End AI/ML Lifecycle Ownership

High-agency Machine Learning Engineer specializing in LLM Fine-Tuning, Distributed Training (FSDP, DeepSpeed), Inference Optimization, and building production-grade AI systems on AWS/Kubernetes.


🧠 Technical Stack

Core AI & GenAI

MLOps & Infrastructure

Languages & Databases


πŸš€ Professional Highlights

Since much of my work is proprietary, here is an overview of the production systems I have architected and deployed in Healthcare and Enterprise domains:

πŸ”Ή GPU Orchestration & Distributed Training

  • Fair GPU Compute Scheduler: Engineered workload orchestration on Amazon EKS with Apache YuniKorn, implementing gang scheduling and backfill algorithms to bin-pack heterogeneous multi-GPU jobs while enforcing starvation-proof priority queues.
  • Hybrid Capacity Strategy: Designed a cost-optimized GPU strategy maximizing Reserved Instance utilization for steady workloads while using Karpenter to autoscale Spot capacity for bursts.
  • Developer Tooling: Built an internal Python CLI and GitOps abstraction layer that replaced complex Kubernetes YAML for 10 research teams, standardizing distributed training and checkpointing workflows.

πŸ”Ή Large Language Models & GenAI

  • 70B+ Model Fine-Tuning: Engineered a distributed training pipeline using PyTorch FSDP to fine-tune 70B+ parameter models (Llama-3, BioMistral, Med-42, Gemma-2) on multi-node GPU clusters, leveraging LoRA/QLoRA and Quantization-Aware Training (QAT).
  • Custom FlashAttention-2 Adapters: Developed custom adapter classes to enable efficient fine-tuning of proprietary model architectures that lacked native support, increasing training throughput by 3x.
  • SOAP Notes Generation: Benchmarked proprietary APIs (GPT-4o, Claude Sonnet 3.5, Gemini 1.5 Pro) versus open-source LLMs for therapist-patient conversation summarization into structured clinical documentation.
  • Production ASR Pipeline: Fine-tuned multiple Whisper model variants on custom datasets, optimizing WER vs. latency tradeoff and deployed on AWS SageMaker with auto-scaling and CloudWatch monitoring.

πŸ”Ή Agentic Workflows & RAG Systems

  • FleetMind: Architected a production-ready agentic workflow using LangGraph with Planner-Specialist-Critic pattern, enabling natural language queries over complex vehicle telemetry data.
  • Knowledge Graph Integration: Engineered a Neo4j knowledge graph to enable multi-hop relational queries that flat data structures could not support.
  • Self-Correcting Systems: Implemented custom CritiqueAgent loops to significantly improve factual accuracy of LLM-generated answers.

πŸ”Ή Big Data & MLOps at Scale

  • Revenue Impact ML: Trained and deployed ML models using Apache Spark on Databricks, processing terabytes of clickstream data to identify highly engaged visitors, contributing to a 20% revenue increase within 12 months.
  • MLOps Lifecycle: Implemented end-to-end MLOps using MLflow for experiment tracking and model registry, reducing deployment time from weeks to days while ensuring reproducibility.
  • A/B Testing & Optimization: Designed and executed A/B tests with Adobe Target, boosting click-through rate by 10% and conversion rate by 5%.

πŸ“œ Certifications & Education

  • Master of Science in Computer Science | University of Illinois Chicago, IL
  • AWS DataBricks Platform Architect | View Credential
  • AWS Cloud Practitioner | View Credential

πŸ“Š GitHub Stats

Sandeep's Stats

Sandeep's Streak

Pinned Loading

  1. adapters adapters Public

    Forked from adapter-hub/adapters

    A Unified Library for Parameter-Efficient and Modular Transfer Learning

    Jupyter Notebook

  2. chatbot chatbot Public

    Python

  3. cuda cuda Public

    Practicing CUDA based on UIUC ECE-408

    Cuda

  4. distributed-task-scheduler distributed-task-scheduler Public

    C++

  5. fleetmind fleetmind Public

    Python

  6. map-reduce map-reduce Public

    Python