Sandeep Reddy san-deep-reddy

Hi, I'm Sandeep Reddy 👋

AI Engineer | ML Infrastructure Specialist | Production AI Systems

Chicago, IL • 4+ Years Experience • End-to-End AI/ML Lifecycle Ownership

High-agency Machine Learning Engineer specializing in LLM Fine-Tuning, Distributed Training (FSDP, DeepSpeed), Inference Optimization, and building production-grade AI systems on AWS/Kubernetes.

🧠 Technical Stack

Core AI & GenAI

MLOps & Infrastructure

Languages & Databases

🚀 Professional Highlights

Since much of my work is proprietary, here is an overview of the production systems I have architected and deployed in Healthcare and Enterprise domains:

🔹 GPU Orchestration & Distributed Training

Fair GPU Compute Scheduler: Engineered workload orchestration on Amazon EKS with Apache YuniKorn, implementing gang scheduling and backfill algorithms to bin-pack heterogeneous multi-GPU jobs while enforcing starvation-proof priority queues.
Hybrid Capacity Strategy: Designed a cost-optimized GPU strategy maximizing Reserved Instance utilization for steady workloads while using Karpenter to autoscale Spot capacity for bursts.
Developer Tooling: Built an internal Python CLI and GitOps abstraction layer that replaced complex Kubernetes YAML for 10 research teams, standardizing distributed training and checkpointing workflows.

🔹 Large Language Models & GenAI

70B+ Model Fine-Tuning: Engineered a distributed training pipeline using PyTorch FSDP to fine-tune 70B+ parameter models (Llama-3, BioMistral, Med-42, Gemma-2) on multi-node GPU clusters, leveraging LoRA/QLoRA and Quantization-Aware Training (QAT).
Custom FlashAttention-2 Adapters: Developed custom adapter classes to enable efficient fine-tuning of proprietary model architectures that lacked native support, increasing training throughput by 3x.
SOAP Notes Generation: Benchmarked proprietary APIs (GPT-4o, Claude Sonnet 3.5, Gemini 1.5 Pro) versus open-source LLMs for therapist-patient conversation summarization into structured clinical documentation.
Production ASR Pipeline: Fine-tuned multiple Whisper model variants on custom datasets, optimizing WER vs. latency tradeoff and deployed on AWS SageMaker with auto-scaling and CloudWatch monitoring.

🔹 Agentic Workflows & RAG Systems

FleetMind: Architected a production-ready agentic workflow using LangGraph with Planner-Specialist-Critic pattern, enabling natural language queries over complex vehicle telemetry data.
Knowledge Graph Integration: Engineered a Neo4j knowledge graph to enable multi-hop relational queries that flat data structures could not support.
Self-Correcting Systems: Implemented custom CritiqueAgent loops to significantly improve factual accuracy of LLM-generated answers.

🔹 Big Data & MLOps at Scale

Revenue Impact ML: Trained and deployed ML models using Apache Spark on Databricks, processing terabytes of clickstream data to identify highly engaged visitors, contributing to a 20% revenue increase within 12 months.
MLOps Lifecycle: Implemented end-to-end MLOps using MLflow for experiment tracking and model registry, reducing deployment time from weeks to days while ensuring reproducibility.
A/B Testing & Optimization: Designed and executed A/B tests with Adobe Target, boosting click-through rate by 10% and conversion rate by 5%.

📜 Certifications & Education

Master of Science in Computer Science | University of Illinois Chicago, IL
AWS DataBricks Platform Architect | View Credential
AWS Cloud Practitioner | View Credential

Provide feedback

Saved searches

Use saved searches to filter your results more quickly