Skip to content
View ananta-mimo's full-sized avatar

Block or report ananta-mimo

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ananta-mimo/README.md

Hi, I'm Ananta Sinha, Ph.D.

Data Scientist | Production ML & Computer Vision | Ph.D., Engineering


I am a Data Scientist with a Ph.D. in Engineering and 5+ years of experience designing and deploying production ML systems across computer vision, time-series forecasting, and causal inference. I work end-to-end, from feature engineering and model validation through cloud deployment and stakeholder communication.

My background bridges rigorous statistical research and applied data science, with peer-reviewed publications and production systems running across federally funded and commercial analytics projects.

🔭 Currently working on: gwnbr, an open-source Python package for Geographically Weighted Negative Binomial Regression, alongside a JOSS submission and an applied case study.

What I Do

  • Machine Learning & AI Solutions Build and deploy ML pipelines for classification, prediction, and anomaly detection using Python, PyTorch, scikit-learn, and Spark.
  • Time-Series & Forecasting Develop predictive models for demand, system performance, and rare-event risks using ARIMA, SARIMAX, LSTM, and Extreme Value Theory.
  • Computer Vision at Production Scale Design and deploy CNN and HRNet pipelines on cloud infrastructure for feature extraction from high-resolution imagery.
  • Causal Inference & Statistical Modeling Apply causal inference, hypothesis testing, and experimental design to quantify drivers of behavioral and operational outcomes.
  • Data-Driven Decision Support Translate analytical results into clear strategies with dashboards, SHAP-based explainability, and storytelling that empowers technical and non-technical stakeholders alike.
  • End-to-End Data Science Lifecycle Experience across data acquisition, ETL, feature engineering, model validation, deployment, and monitoring in both cloud and on-prem environments.

Featured Open-Source Work

gwnbr, a modular Python package implementing Geographically Weighted Negative Binomial Regression (GWNBR), translating a SAS macro by Silva & Rodrigues (2014) into an open-source, peer-reviewable tool.

  • Three model classes (GWNBR, GWNBRg, GWPR), multiple kernel functions, Golden Section Search bandwidth selection, Newton-Raphson and IRLS solvers, and a 28-test pytest suite
  • Validated on a 1,460-unit spatial study; GWNBR outperformed GWPR by a wide margin (AICc 14,147 vs. 38,664) on data with ~42x overdispersion
  • MIT licensed, archived on Zenodo (DOI: 10.5281/zenodo.21041972), JOSS submission in progress

Projects

  • Computer Vision at Production Scale Built and deployed an HRNet inference pipeline with a custom sliding 3×3 tile radial weighting strategy on Azure, cutting data collection costs by 80% ($100K+ to under $20K) and turnaround time from months to two weeks.
  • Multi-Modal CNN Pipeline Designed an end-to-end CNN pipeline (PyTorch, Azure ML) integrating imagery, geospatial, and text data, achieving 92% feature extraction accuracy across 20+ jurisdictions.
  • Anomaly Detection & Rare-Event Forecasting Developed LSTM/ARIMA anomaly detection and Extreme Value Theory frameworks on large continuous sensor streams, cutting calibration costs by 65% and reducing risk exposure by 70%.
  • Causal Inference for Behavioral Modeling Applied causal inference methods to quantify marginal contributions of exogenous signals on behavioral outcomes, generating feature importance insights for decision support systems.
  • Binary Classification on Behavioral Data Built logistic regression and XGBoost models on multi-signal behavioral and sensor datasets, validated with ROC/AUC and confusion matrix analysis, improving predictive accuracy by 40%.
  • Behavioral Modeling for Policy Decisions Designed and validated regression-based models of driver-pedestrian interactions to guide infrastructure investment and safety improvements.
  • Freight & Logistics Analytics Built demand models for truck parking, freight generators, and corridor bottlenecks using spatial-temporal analysis and big data sources (WIM, INRIX, Replica).

Technical Skills

Python SQL R PyTorch scikit-learn Spark AWS Azure Tableau

  • Languages: Python, SQL, R, MATLAB
  • Machine Learning: PyTorch, scikit-learn, XGBoost, CatBoost, LSTM, CNNs, HRNet, anomaly detection
  • Statistical Methods: Causal inference, hypothesis testing, experimental design, Extreme Value Theory, GLMs, backtesting
  • Data Engineering: PySpark, Parquet, ELT pipelines, Medallion architecture (Bronze/Silver/Gold), multi-source integration
  • Visualization & Explainability: SHAP, Streamlit, Tableau, Plotly, matplotlib, seaborn
  • Cloud & MLOps: AWS, Azure ML, Weights & Biases, Git, GitHub Actions

What Sets Me Apart

I am both a Data Scientist and an Engineer, which means I approach problems with a structured, systems-oriented mindset while staying focused on delivering data solutions with measurable outcomes. Whether it's optimizing infrastructure, detecting anomalies, or designing predictive models, my goal is to turn complex data into insights that create value for people, organizations, and communities.

🌍 Find Me Online


🧠 Curious about how data, ML, and rigorous methodology intersect to build systems people can actually rely on.

Pinned Loading

  1. I66-congestion-explorer-streamlit I66-congestion-explorer-streamlit Public

    Interactive Streamlit app exploring short-term congestion forecasts on I-66 Inside the Beltway, Northern Virginia. Select direction, hour, and forecast horizon to see TTI predictions, congestion st…

    Python

  2. telco-churn-xgboost-shap-causal telco-churn-xgboost-shap-causal Public

    End-to-end churn analytics: Telco customer churn prediction using XGBoost, SHAP explainability, and causal inference (Double Machine Learning) with an interactive Streamlit dashboard.

    Jupyter Notebook

  3. I66-congestion-predictability I66-congestion-predictability Public

    Short-term freeway congestion forecasting using probe-based TTI data on I-66 ITB, Northern Virginia. Establishes an endogenous predictability ceiling across 41 TMCs at 5, 15, and 30-minute horizons…

    Jupyter Notebook

  4. LSTM_AADT_forecast_for-bridges LSTM_AADT_forecast_for-bridges Public

    Multi-year AADT forecasting using a stacked LSTM network. Modular Python pipeline with CLI training, early stopping, and regression evaluation (MAE, RMSE, MAPE). Includes EDA notebook and sample da…

    Jupyter Notebook

  5. gwnbr gwnbr Public

    Geographically Weighted Negative Binomial Regression in Python - local spatial modeling for over dispersed count data.

    Python 1