I’m an AI researcher and data scientist with an M.Tech in Data Science from IIT Madras and a B.Tech in Computer Science from VIT. My core focus lies at the intersection of Reinforcement Learning (RL), Large Language Models (LLMs), and Deep Learning, where I strive to push the boundaries of AI capabilities.
Research Interests:
- Reinforcement Learning for LLMs (RL4LLMs)
- Multi-Agent Reinforcement Learning (MARL)
- Multi-Agent Reasoning (MAR)
- Communication in Multi-Agent Systems (MAS-Comm)
- Causality in Reinforcement Learning (Causality & RL)
- Representation Learning for Reinforcement Learning (RepL4RL)
I’m seeking a PhD or research position where I can drive innovative AI projects alongside a collaborative team.
-
M.Tech Thesis: DNF-Net: A DL Approach for Advancing Breast Cancer Detection in Histopathology Images. (Poster / PPT)
- Built a magnification-invariant hybrid model that synergizes fuzzy logic—to explicitly handle diagnostic uncertainty (fuzziness)—with deep-learning backbones (Xception, InceptionV3, DenseNet-169) for advanced hierarchical feature extraction, yielding a 5% accuracy gain over SOTA on BreakHis and BACH histopathology datasets—robustly validated at 40×, 100×, 200×, and 400× magnifications and across 2-/4-/8-class tasks.
- Keywords: deep-learning; fuzzy-logic; magnification-invariance; medical-image-analysis; histopathology; image-classification
-
B.Tech Thesis: CXRcovNet: COVID‑19 detection from CXR images using transfer learning approaches. (Repo / PPT)
- Applied Transfer Learning techniques using pre-trained CNN models to classify COVID-19 from Chest X-Ray (CXR) images.
- Keywords: computer-vision, deep-learning, transfer-learning, covid-19, cxr, image-classification
- Reinforcement Fine-Tuning LLMs with GRPO (Repo)
- Investigated the efficacy of GRPO for RFT of LLMs, adapting models for complex reasoning and strategic tasks (demonstrated via a Wordle-style game with Qwen 2.5 7B).
- Tech Stack: Python, PyTorch, RL, LLMs, GRPO
- Keywords: rlft, grpo, llms, reinforcement-learning, fine-tuning, Reward functions, Reward hacking, Calculating loss in GRPO
- Hierarchical Reinforcement Learning (IITM CS6700 PA3) (Repo)
- Implemented and evaluated Hierarchical RL techniques (SMDP Q-Learning, Intra-Option Q-Learning) in the Taxi-v3 environment, analyzing the impact of option design on learning efficiency and policy structure.
- Tech Stack: Python, RL (Hierarchical RL, Q-Learning), OpenAI Gym
- Keywords: hierarchical-rl, smdp, intra-option-q-learning, reinforcement-learning, taxi-v3
- Dueling-DQN & Monte Carlo REINFORCE (IITM CS6700 PA2) (Repo)
- Implemented and compared Dueling-DQN (Type-1 vs Type-2) and Monte Carlo REINFORCE (with/without baseline) algorithms on Acrobot-v1 and CartPole-v1 environments.
- Tech Stack: Python, PyTorch, RL (DQN, Policy Gradient), OpenAI Gym
- Keywords: dueling-dqn, reinforce, baseline, deep-reinforcement-learning, acrobot-v1, cartpole-v1
- Temporal Difference Learning (SARSA & Q-Learning) (IITM CS6700 PA1) (Repo)
- Implemented and compared TD algorithms (SARSA and Q-Learning) in a custom 10x10 Grid World with stochastic transitions and wind effects, building a strong base in core RL concepts.
- Tech Stack: Python, RL (TD Learning, Q-Learning, SARSA), NumPy, Matplotlib
- Keywords: Temporal Difference, SARSA, Q-Learning, Gridworld, Reinforcement Learning, Stochastic Environments
-
Feedforward Neural Networks (FNN) from Scratch (IITM CS6910 PA1) (Repo / W&B Report)
- Built an end-to-end NumPy-only FNN for Fashion-MNIST classification, integrating six optimizers (SGD, Momentum, NAG, RMSProp, Adam, Nadam), four activations (sigmoid, tanh, ReLU, softmax), two losses (MSE, Cross-Entropy), weight initialization (Xavier, random), regularization (L1, L2), early stopping, and W&B-driven hyperparameter sweeps.
- Tech Stack: Python, NumPy, Matplotlib, Seaborn, Scikit-learn, Weights & Biases
- Keywords: feedforward-NN, backpropagation, optimizers, activation-functions, initialization, regularization, hyperparameter-tuning
-
Convolutional Neural Networks (CNN) (IITM CS6910 PA2) (Repo / W&B Report)
- A two-fold project—(i) trained a CNN from scratch in PyTorch with Bayesian hyperparameter optimization via W&B sweeps (tuning filters, kernel sizes, batch norm, dropout, augmentation), including filter visualization and guided backpropagation for interpretability, and (ii) fine-tuned a pre-trained CNN model for performance benchmarking and comparison.
- Tech Stack: Python, PyTorch, OpenCV, Weights & Biases
- Keywords: CNN, Hyperparameter Optimization, Bayesian Optimization, Data Augmentation, Filter Visualization, Guided Backpropagation, Interpretability, W&B
-
Sequence-to-Sequence Learning (RNN) (IITM CS6910 PA3) (Repo / W&B Report)
- Developed and evaluated sequence-to-sequence models (vanilla RNN, LSTM, GRU) with and without attention mechanisms for English-to-Malayalam transliteration (Aksharantar Dataset), analyzing the impact of architectural choices and attention on translation quality.
- Tech Stack: Python, PyTorch, Weights & Biases
- Keywords: Seq2Seq, Attention Mechanisms, RNN, LSTM, GRU, Transliteration, Encoder-Decoder, Attention Heatmaps, NLP
- Advanced Information Retrieval System (IITM CS6370) (Repo / Report)
- Built a hybrid search engine combining TF–IDF VSM, LSA, and a BERT-based reranker for top-k retrieval, with end-to-end evaluation (Precision@k, MAP, nDCG) on the Cranfield and Brown corpora.
- Tech Stack: Python, Scikit-learn, Gensim, PyTorch, Transformers
- Keywords: Information Retrieval, TF–IDF, LSA, ESA, Word2Vec, BERT Reranking, Evaluation Metrics, NLP, Semantic Search
- Mathematical Essays on Core ML Algorithms
- Authored a series of mathematical essays (formatted in IEEE style using LaTeX) dissecting the theoretical underpinnings, derivations, and applications of fundamental ML algorithms:
- Tech Stack: LaTeX, Python (for supporting visualizations/analysis)
- Keywords: Ml Theory, Math Foundations, Linear Regression, Logistic Regression, Decision Trees, Random Forest, Naive Bayes, SVM, LaTeX
- Beyond the Horizon: Exploring the Impact of AI on Early Cancer Detection & Diagnosis — A Comprehensive Review
- Journal: Computers in Biology and Medicine (Impact Factor: 7.7)
- Submission Date: January 2025
- Manuscript ID: CIBM-D-25-00543
- Status: Under Review
Certificate/Specialization | Provider | Date Completed | Link ID |
---|---|---|---|
Advanced Large Language Model Agents | UC Berkeley | May 2025 | Soon, May 31, 2025 |
Linguistic Linked Data – Advanced Topics | German UDS Academy | May 2025 | View Certificate |
Linguistic Linked Data – Essentials | German UDS Academy | Apr 2025 | View Certificate |
Natural Language Processing | Udemy, Inc. | Aug 2023 | View Certificate |
The Complete Python Bootcamp | Udemy, Inc. | Aug 2023 | View Certificate |
Mathematics for ML & DS Specialization | DeepLearning.AI | Jun 2023 | View Certificate |
Machine Learning Specialization | DeepLearning.AI | Jan 2023 | View Certificate |
Google Data Analytics Specialization | Apr 2022 | View Certificate |