Skip to content
View basaanithanaveenkumar's full-sized avatar
💭
always willing to contribute
💭
always willing to contribute

Block or report basaanithanaveenkumar

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Profile Views


Typing SVG


About Me

I architect and build multi-modality foundation models for Physical AI — spanning the full stack from raw sensor data to deployed policies running on vehicle SoCs. My work sits at the intersection of 2D/3D Perception, End-to-End driving stacks, and Vision-Language-Action (VLA) systems that reason about the world before they act.

I design from first principles. When I build a model, I understand every layer — from tokenization strategies and attention variants to training dynamics and inference optimization for real-time hardware. I have applied this depth at BMW Techworks and Mercedes-Benz R&D, where I led teams building production autonomous driving systems, and shipped work that runs inside real vehicles.

Beyond perception and planning, I am actively developing HALO — a personal series of foundation models built entirely from scratch: a World Action Reasoning Model, a VLA, and a VLM — each designed to push the frontier of embodied intelligence.

Multi-awarded inventor with patents filed across the US, EP, and AU, and a published trajectory prediction paper.


Where I Have Worked

🚗 BMW Techworks India — Senior Lead Engineer & Assistant Manager, Automated Driving
Oct 2025 – Present · Bangalore, India

Leading architecture and implementation of a Sparse BEV-based End-to-End autonomous driving stack — owning the complete loop from data curation through policy training. Pioneered an embodied VLM-powered Scene Mining agent that intelligently indexes safety-critical scenarios directly feeding the continuous retraining pipeline. Enabling natural language command-based contextual responses from historical observations through VLM-powered autonomous capabilities.

Mercedes-Benz Research & Development India — Perception Engineer, L3 Automated Driving
Jun 2023 – Sep 2025 · Bangalore, India

Contributed to the self-driving stack powering next-generation Mercedes-Benz vehicles (CLA and new releases). Designed and trained multi-modality foundation models fusing Camera, LiDAR, RADAR, and language for autonomous driving agents that reason before acting. Architected an E2E AD foundation model fusing self-supervised learning, imitation policy, and 4D semantic occupancy. Closed the data-to-deployment loop by integrating unified models onto Nvidia Orin for cross-platform vehicle SoC deployment.

🏆 Bronze Star Award — Revolutionized cross-functional collaboration and reengineered workflows to slash operational costs.
🏆 PAC Award — Recognized as lead inventor on multiple patents driving breakthrough intellectual property.

🔬 TCS Research & Innovation Labs — Research ML Developer, Sensorium.ai
Feb 2021 – Jun 2023 · Bangalore, India

Built and shipped a wide range of production-grade computer vision models across environmental AI — detection of bleeds, canopy changes, vegetation encroachment, flood events, snow, and more. Pioneered an uncertainty-aware Active Learning auto-annotation framework. Developed the SenSat and SenCV libraries to accelerate satellite and CV pipelines. Mentored engineers building ecological intelligence solutions fusing SAR, Hyperspectral, and DEM sensors.

🏆 IP Creation Award — Spearheaded high-impact patent filings in autonomous vision systems.

Personal Projects — Foundation Models Built from Scratch

These are not fine-tuned wrappers. Every architecture decision, every training objective, every component — engineered from first principles.

🌐 HALO-WARM

World Action Reasoning Model

Custom Transformer Decoder with three-stream tokenization across vision, language, robot state, and world query tokens. Mixture-of-Experts architecture with shared and routed experts per layer. Multi-output heads spanning language reasoning, action decoding, future depth prediction, future frame prediction, and future flow prediction. Currently adding model-based RL fine-tuning using imagined futures as a training environment for policy optimisation via PPO.

🤖 HALO VLA

Vision-Language-Action Model

Built from scratch with a Flow Matching Action chunking Decoder — replacing standard action heads to model continuous robot action sequences with smooth, natural trajectories. Designed for real robot deployment where trajectory quality directly determines task success.

👁️ HALO VLM

Vision-Language Model

Custom ViT and Transformer Decoder from scratch with causal masking and autoregressive generation. Sparse Mixture-of-Experts layer (DeepSeek-inspired) with custom routing and load balancing. Multi-Token Prediction for simultaneous n-future token generation, enabling dramatically faster inference. Gradient checkpointing to cut VRAM footprint significantly.

🧩 VL-JEPA

Joint Embedding Predictive Architecture

World model that learns abstract representations by predicting masked patches of visual and linguistic inputs entirely in latent space — no pixel-level or token-level reconstruction. Pure latent-space learning of the structure of the visual-linguistic world.

🌿 DINO from Scratch

Self-Supervised Visual Representation

Full teacher-student self-distillation framework (Self-Distillation with No Labels) that learns high-quality visual features without any annotations. Pure self-supervised learning from raw visual signal.

🍎 Agri-Sort Grading System

360° Real-Time Vision Pipeline

State-memory vision algorithm with full 360° real-time perception for sorting and grading produce by size, shape, colour, weight, and surface quality. Transformed throughput from a small-scale manual operation to a high-volume automated pipeline.


Patents & Publications

Type Title
📄 Paper Map-Less Yet Accurate: Trajectory Prediction for Traffic Agents Using Online HD Map Reconstruction
🔒 Patent (US, EP, AU) Autonomous task composition of vision pipelines using an algorithm selection framework
🔒 Patent Robust Vehicle Radar System — Automatic Clutter Removal
🔒 Patent Robust Lidar PCD for Moderate Weather
🔒 Patent Tunnel Map Generation with Adaptive Neural Compression
🔒 Patent Context-Aware ADAS Adaptation through VLM — Multimodal Behavioural Analytics

Technical Expertise

Foundation Models & AI Architectures

PyTorch HuggingFace TensorFlow CUDA ONNX TensorRT

VLA · VLM · World Action Models · Diffusion Models · Flow Matching · DiT · MoE · ViT · BEVFormer · BEV-Det

Autonomous Driving & Perception

End-to-End Driving Stacks · BEV Perception · 3D Object Detection · Trajectory Prediction · 4D Semantic Occupancy · Multi-Camera Fusion · LiDAR · RADAR · HD Maps · SLAM · Multi-View Geometry

Training & Optimization

LoRA / QLoRA · QAT · Latency-Aware In-Training Pruning · Neural Architecture Search · Mixed Precision · Knowledge Distillation · Weight Sharing · Sensor Dropout · Large-Scale Distributed Training (HPC · A100 clusters)

Agentic & Generative AI

LangChain OpenAI

RAG · LangGraph · CrewAI · Embodied Agents · Scene Mining Pipelines

Languages & Tools

Python C++ C Docker Linux Git ROS


GitHub Stats


Pinned Loading

  1. object-detection-BBD object-detection-BBD Public

    Python 1

  2. self-implementation-DINO self-implementation-DINO Public

    A from-scratch PyTorch implementation of DINO (Self-Distillation with No Labels) for self-supervised learning with Vision Transformers for learning purpose

    Jupyter Notebook 1

  3. Halo-VLM Halo-VLM Public template

    Multi-token prediction in Vision-Language Models (VLMs) is an advanced training and inference technique that enables models to predict multiple future tokens simultaneously, rather than one token a…

    Python 2

  4. HaloBlocks HaloBlocks Public

    Python library designed to make model experimentation seamless and fast. The goal was simple: treat every component (attention heads, MLPs, MoE layers) as a plug-and-play block so you can focus on …

    Python 5