Ananta Sinha ananta-mimo

Hi, I'm Ananta Sinha, Ph.D.

Data Scientist | Production ML & Computer Vision | Ph.D., Engineering

I am a Data Scientist with a Ph.D. in Engineering and 5+ years of experience designing and deploying production ML systems across computer vision, time-series forecasting, and causal inference. I work end-to-end, from feature engineering and model validation through cloud deployment and stakeholder communication.

My background bridges rigorous statistical research and applied data science, with peer-reviewed publications and production systems running across federally funded and commercial analytics projects.

🔭 Currently working on: gwnbr, an open-source Python package for Geographically Weighted Negative Binomial Regression, alongside a JOSS submission and an applied case study.

What I Do

Machine Learning & AI Solutions Build and deploy ML pipelines for classification, prediction, and anomaly detection using Python, PyTorch, scikit-learn, and Spark.
Time-Series & Forecasting Develop predictive models for demand, system performance, and rare-event risks using ARIMA, SARIMAX, LSTM, and Extreme Value Theory.
Computer Vision at Production Scale Design and deploy CNN and HRNet pipelines on cloud infrastructure for feature extraction from high-resolution imagery.
Causal Inference & Statistical Modeling Apply causal inference, hypothesis testing, and experimental design to quantify drivers of behavioral and operational outcomes.
Data-Driven Decision Support Translate analytical results into clear strategies with dashboards, SHAP-based explainability, and storytelling that empowers technical and non-technical stakeholders alike.
End-to-End Data Science Lifecycle Experience across data acquisition, ETL, feature engineering, model validation, deployment, and monitoring in both cloud and on-prem environments.

Featured Open-Source Work

gwnbr, a modular Python package implementing Geographically Weighted Negative Binomial Regression (GWNBR), translating a SAS macro by Silva & Rodrigues (2014) into an open-source, peer-reviewable tool.

Three model classes (GWNBR, GWNBRg, GWPR), multiple kernel functions, Golden Section Search bandwidth selection, Newton-Raphson and IRLS solvers, and a 28-test pytest suite
Validated on a 1,460-unit spatial study; GWNBR outperformed GWPR by a wide margin (AICc 14,147 vs. 38,664) on data with ~42x overdispersion
MIT licensed, archived on Zenodo (DOI: 10.5281/zenodo.21041972), JOSS submission in progress

Projects

Computer Vision at Production Scale Built and deployed an HRNet inference pipeline with a custom sliding 3×3 tile radial weighting strategy on Azure, cutting data collection costs by 80% ($100K+ to under $20K) and turnaround time from months to two weeks.
Multi-Modal CNN Pipeline Designed an end-to-end CNN pipeline (PyTorch, Azure ML) integrating imagery, geospatial, and text data, achieving 92% feature extraction accuracy across 20+ jurisdictions.
Anomaly Detection & Rare-Event Forecasting Developed LSTM/ARIMA anomaly detection and Extreme Value Theory frameworks on large continuous sensor streams, cutting calibration costs by 65% and reducing risk exposure by 70%.
Causal Inference for Behavioral Modeling Applied causal inference methods to quantify marginal contributions of exogenous signals on behavioral outcomes, generating feature importance insights for decision support systems.
Binary Classification on Behavioral Data Built logistic regression and XGBoost models on multi-signal behavioral and sensor datasets, validated with ROC/AUC and confusion matrix analysis, improving predictive accuracy by 40%.
Behavioral Modeling for Policy Decisions Designed and validated regression-based models of driver-pedestrian interactions to guide infrastructure investment and safety improvements.
Freight & Logistics Analytics Built demand models for truck parking, freight generators, and corridor bottlenecks using spatial-temporal analysis and big data sources (WIM, INRIX, Replica).

Technical Skills

Languages: Python, SQL, R, MATLAB
Machine Learning: PyTorch, scikit-learn, XGBoost, CatBoost, LSTM, CNNs, HRNet, anomaly detection
Statistical Methods: Causal inference, hypothesis testing, experimental design, Extreme Value Theory, GLMs, backtesting
Data Engineering: PySpark, Parquet, ELT pipelines, Medallion architecture (Bronze/Silver/Gold), multi-source integration
Visualization & Explainability: SHAP, Streamlit, Tableau, Plotly, matplotlib, seaborn
Cloud & MLOps: AWS, Azure ML, Weights & Biases, Git, GitHub Actions

What Sets Me Apart

I am both a Data Scientist and an Engineer, which means I approach problems with a structured, systems-oriented mindset while staying focused on delivering data solutions with measurable outcomes. Whether it's optimizing infrastructure, detecting anomalies, or designing predictive models, my goal is to turn complex data into insights that create value for people, organizations, and communities.

🌍 Find Me Online

🔗 LinkedIn
📄 Google Scholar (search Ananta Sinha)
📫 Email: anantasinha60@gmail.com

🧠 Curious about how data, ML, and rigorous methodology intersect to build systems people can actually rely on.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ananta Sinha ananta-mimo

Achievements