Data Engineer | ML/AI & Production MLOps | Healthcare AI & Quantitative Finance
Recent MS in Information Systems graduate (Northeastern University, August 2025) specializing in building production ML and data systems. I bridge research and production: 3 IEEE publications on explainable AI and adversarial robustness, plus real-world engineering experience building scalable data pipelines and MLOps infrastructure.
Resume: View Resume (PDF)
I am a data and machine learning professional with expertise spanning the full data lifecycle, from engineering scalable pipelines and databases to building production ML systems and delivering actionable analytics. I hold a Master of Science in Information Systems from Northeastern University (August 2025) and a Bachelor of Technology in Information Technology from Veermata Jijabai Technological Institute.
My experience bridges research and production environments. At Brigham and Women's Hospital and Northeastern's Amal Lab, I developed deep learning models for medical image classification, processed large-scale multi-omics datasets using distributed computing (Spark, Dask), and built NLP pipelines analyzing clinical trials data. On the production side, I've architected real-time streaming platforms with Kafka/Spark, microservices APIs with FastAPI, and MLOps infrastructure (Docker, Kubernetes, CI/CD) with comprehensive observability.
Northeastern University, Boston, MA
Master of Science in Information Systems (August 2025)
Focused Coursework: Advanced Data Science & Architecture, Parallel Machine Learning & AI, LLM with Knowledge Graph Databases, Natural Language Engineering, AI Generative Modeling with focus in Finance
Veermata Jijabai Technological Institute, Mumbai, India
Bachelor of Technology in Information Technology (June 2023)
Focused Coursework: Data Structures and Algorithms, Linear Algebra, Discrete Mathematics, Artificial Intelligence, Machine Learning, Data Architecture, Network Security, Big Data Analysis
AI/ML Research Assistant, Amal Lab, Northeastern University (June 2024 – August 2025)
- Engineered Med-SAM medical image segmentation system for multi-modal datasets (MRI, CT, histopathology) with automated preprocessing pipelines and custom data loaders
- Architected scalable big data infrastructure processing large-scale TCGA multi-omics datasets using distributed computing frameworks
- Developed production-ready deep learning classification system for skin cancer detection through comparative analysis of state-of-the-art architectures with ensemble learning
Research Data Scientist, Brigham and Women's Hospital (August 2024 – December 2024)
- Architected comprehensive meta-analysis framework processing ClinicalTrials.gov and PubMed databases using machine learning algorithms
- Engineered clinical trial news analytics pipeline analyzing pharmaceutical reports using transformer-based NLP models (BERT, RoBERTa) and sentiment analysis
- Developed quantitative finance modeling system analyzing market microstructure impacts of clinical trial announcements using time-series econometric analysis
Software Engineer, HCL Technologies (June 2022 – July 2022)
- Architected enterprise-grade RPA automation system using UiPath Studio integrating web scraping algorithms and API orchestration
- Engineered intelligent document processing solution leveraging OCR technologies for structured PDF invoice processing
Programming Languages: Python, SQL, C++, Java, R, JavaScript, MATLAB, Cypher
Machine Learning & AI: TensorFlow, PyTorch, Keras, Scikit-learn, XGBoost, LightGBM, Stable-Baselines3, OpenAI Gym, Supervised Learning, Unsupervised Learning, Transfer Learning, Reinforcement Learning, Ensemble Methods, Hyperparameter Optimization
Deep Learning: CNN, LSTM, GRU, ResNet, DenseNet, EfficientNet, VGG, Vision Transformers, DQN, PPO, Distributed Data Parallel, Multi-GPU Training
Natural Language Processing: BERT, RoBERTa, Transformers (Hugging Face), NLTK, spaCy, CodeT5, VADER, TextBlob, Named Entity Recognition
Computer Vision: OpenCV, torchvision, scikit-image, Grad-CAM, Image Segmentation, Object Detection
Data Science & Analytics: NumPy, Pandas, SciPy, statsmodels, Statistical Analysis, A/B Testing, Hypothesis Testing, Time Series Forecasting, Feature Engineering, PCA, t-SNE, UMAP, Causal Inference, Monte Carlo Simulations
Data Engineering & ETL: Apache Spark, PySpark, Apache Kafka, Apache Airflow, dbt, Hadoop, Dask, ETL/ELT Pipelines, Stream Processing, Batch Processing, Data Orchestration, Great Expectations, Medallion Architecture
Databases: PostgreSQL, MySQL, MongoDB, Neo4j, SQLite, Snowflake, BigQuery, Redis, SQLAlchemy, FAISS
Cloud Platforms: AWS (EC2, S3, Lambda), GCP (BigQuery, GCS, Cloud Pub/Sub)
MLOps & Production: MLflow, Docker, Kubernetes, Docker Compose, CI/CD, GitHub Actions, Model Monitoring, Model Versioning, Prometheus, Grafana, Terraform
Software Engineering: FastAPI, Flask, REST APIs, Microservices, Git, pytest, Unit Testing, Integration Testing, Async/Await, Agile/Scrum, Jira
Quantitative Finance: Portfolio Optimization, Options Pricing, Risk Metrics (Sharpe, VaR), Time Series Modeling (ARIMA, GARCH), yFinance, Bloomberg API
- Analysis of Explainable AI Methods on Medical Image Classification - IEEE ICAECT 2023
- Adversarial Attacks and Defenses for Skin Cancer Classification - IEEE ICONAT 2023
- Intrusion Detection: A Deep Learning Approach - IEEE ICEEICT 2023
- Image Captioning Using Transformer: VisionAid - IRJET
I don't just build models, I build the entire infrastructure around them. From ETL pipelines and feature stores to model deployment and monitoring. I've published research AND shipped production code. I understand both the math and the engineering.
ML Engineer, Data Engineer, or Quantitative Developer roles where I can build impactful systems at the intersection of data, ML, and production software.
- Email: joganivinay@gmail.com
- Portfolio: joganivinay.wixsite.com/website
- LinkedIn: linkedin.com/in/vinayjogani
- Google Scholar: View Profile
- Scopus ID: 58030923600
- ORCiD: 0009-0005-9568-5747
