I architect and build multi-modality foundation models for Physical AI — spanning the full stack from raw sensor data to deployed policies running on vehicle SoCs. My work sits at the intersection of 2D/3D Perception, End-to-End driving stacks, and Vision-Language-Action (VLA) systems that reason about the world before they act.
I design from first principles. When I build a model, I understand every layer — from tokenization strategies and attention variants to training dynamics and inference optimization for real-time hardware. I have applied this depth at BMW Techworks and Mercedes-Benz R&D, where I led teams building production autonomous driving systems, and shipped work that runs inside real vehicles.
Beyond perception and planning, I am actively developing HALO — a personal series of foundation models built entirely from scratch: a World Action Reasoning Model, a VLA, and a VLM — each designed to push the frontier of embodied intelligence.
Multi-awarded inventor with patents filed across the US, EP, and AU, and a published trajectory prediction paper.
| 🚗 |
BMW Techworks India — Senior Lead Engineer & Assistant Manager, Automated Driving Oct 2025 – Present · Bangalore, India Leading architecture and implementation of a Sparse BEV-based End-to-End autonomous driving stack — owning the complete loop from data curation through policy training. Pioneered an embodied VLM-powered Scene Mining agent that intelligently indexes safety-critical scenarios directly feeding the continuous retraining pipeline. Enabling natural language command-based contextual responses from historical observations through VLM-powered autonomous capabilities. |
| ⭐ |
Mercedes-Benz Research & Development India — Perception Engineer, L3 Automated Driving Jun 2023 – Sep 2025 · Bangalore, India Contributed to the self-driving stack powering next-generation Mercedes-Benz vehicles (CLA and new releases). Designed and trained multi-modality foundation models fusing Camera, LiDAR, RADAR, and language for autonomous driving agents that reason before acting. Architected an E2E AD foundation model fusing self-supervised learning, imitation policy, and 4D semantic occupancy. Closed the data-to-deployment loop by integrating unified models onto Nvidia Orin for cross-platform vehicle SoC deployment. 🏆 Bronze Star Award — Revolutionized cross-functional collaboration and reengineered workflows to slash operational costs. 🏆 PAC Award — Recognized as lead inventor on multiple patents driving breakthrough intellectual property. |
| 🔬 |
TCS Research & Innovation Labs — Research ML Developer, Sensorium.ai Feb 2021 – Jun 2023 · Bangalore, India Built and shipped a wide range of production-grade computer vision models across environmental AI — detection of bleeds, canopy changes, vegetation encroachment, flood events, snow, and more. Pioneered an uncertainty-aware Active Learning auto-annotation framework. Developed the SenSat and SenCV libraries to accelerate satellite and CV pipelines. Mentored engineers building ecological intelligence solutions fusing SAR, Hyperspectral, and DEM sensors. 🏆 IP Creation Award — Spearheaded high-impact patent filings in autonomous vision systems. |
These are not fine-tuned wrappers. Every architecture decision, every training objective, every component — engineered from first principles.
|
World Action Reasoning Model Custom Transformer Decoder with three-stream tokenization across vision, language, robot state, and world query tokens. Mixture-of-Experts architecture with shared and routed experts per layer. Multi-output heads spanning language reasoning, action decoding, future depth prediction, future frame prediction, and future flow prediction. Currently adding model-based RL fine-tuning using imagined futures as a training environment for policy optimisation via PPO. |
Vision-Language-Action Model Built from scratch with a Flow Matching Action chunking Decoder — replacing standard action heads to model continuous robot action sequences with smooth, natural trajectories. Designed for real robot deployment where trajectory quality directly determines task success. |
|
Vision-Language Model Custom ViT and Transformer Decoder from scratch with causal masking and autoregressive generation. Sparse Mixture-of-Experts layer (DeepSeek-inspired) with custom routing and load balancing. Multi-Token Prediction for simultaneous n-future token generation, enabling dramatically faster inference. Gradient checkpointing to cut VRAM footprint significantly. |
Joint Embedding Predictive Architecture World model that learns abstract representations by predicting masked patches of visual and linguistic inputs entirely in latent space — no pixel-level or token-level reconstruction. Pure latent-space learning of the structure of the visual-linguistic world. |
|
Self-Supervised Visual Representation Full teacher-student self-distillation framework (Self-Distillation with No Labels) that learns high-quality visual features without any annotations. Pure self-supervised learning from raw visual signal. |
360° Real-Time Vision Pipeline State-memory vision algorithm with full 360° real-time perception for sorting and grading produce by size, shape, colour, weight, and surface quality. Transformed throughput from a small-scale manual operation to a high-volume automated pipeline. |
| Type | Title |
|---|---|
| 📄 Paper | Map-Less Yet Accurate: Trajectory Prediction for Traffic Agents Using Online HD Map Reconstruction |
| 🔒 Patent (US, EP, AU) | Autonomous task composition of vision pipelines using an algorithm selection framework |
| 🔒 Patent | Robust Vehicle Radar System — Automatic Clutter Removal |
| 🔒 Patent | Robust Lidar PCD for Moderate Weather |
| 🔒 Patent | Tunnel Map Generation with Adaptive Neural Compression |
| 🔒 Patent | Context-Aware ADAS Adaptation through VLM — Multimodal Behavioural Analytics |
VLA · VLM · World Action Models · Diffusion Models · Flow Matching · DiT · MoE · ViT · BEVFormer · BEV-Det
End-to-End Driving Stacks · BEV Perception · 3D Object Detection · Trajectory Prediction · 4D Semantic Occupancy · Multi-Camera Fusion · LiDAR · RADAR · HD Maps · SLAM · Multi-View Geometry
LoRA / QLoRA · QAT · Latency-Aware In-Training Pruning · Neural Architecture Search · Mixed Precision · Knowledge Distillation · Weight Sharing · Sensor Dropout · Large-Scale Distributed Training (HPC · A100 clusters)
RAG · LangGraph · CrewAI · Embodied Agents · Scene Mining Pipelines
