TanML: Automated Model Validation Toolkit for Tabular Machine Learning

🎥 Watch the 5-Minute TanML Walkthrough on YouTube →
(End-to-end demo of UI, validation checks, and automated report generation)

TanML validates tabular ML models with a zero-config Streamlit UI and exports an audit-ready, editable Word report (.docx). It covers data quality, correlation/VIF, performance, explainability (SHAP), and robustness/stress tests—built for regulated settings (MRM, credit risk, insurance, etc.).

Status: Beta (0.x)
License: MIT
Python: 3.8–3.12
OS: Linux / macOS / Windows (incl. WSL)

Why TanML?

Zero-config UI: launch Streamlit, upload data, click Run—no YAML needed.
Audit-ready outputs: tables/plots + a polished DOCX your stakeholders can edit.
Regulatory alignment: supports common Model Risk Management themes (e.g., SR 11-7 style).
Works with your stack: scikit-learn, XGBoost/LightGBM/CatBoost, etc.

Install

pip install tanml

Quick Start (UI)

tanml ui

Opens at http://127.0.0.1:8501
Upload limit ~1 GB (preconfigured)
Telemetry disabled by default

In the app

Load data — upload a cleaned CSV/XLSX/Parquet (optional: raw or separate Train/Test).
Select target & features — target auto-suggested; features default to all non-target columns.
Pick a model — choose library/algorithm (scikit-learn, XGBoost, LightGBM, CatBoost) and tweak params.
Run validation — click ▶️ Refit & validate.
Export — click ⬇️ Download report to get a DOCX (auto-selects classification/regression template).

Outputs

Report: ./.ui_runs/<session>/tanml_report_*.docx
Artifacts (CSV/PNGs): ./.ui_runs/<session>/artifacts/*

What TanML Checks

Raw Data (optional): rows/cols, missingness, duplicates, constant columns
Data Quality & EDA: summaries, distributions
Correlation & Multicollinearity: heatmap, top-pairs CSV, VIF table
Performance
- Classification: AUC, PR-AUC, KS, decile lift, confusion
- Regression: R², MAE, MSE/RMSE, error stats
Explainability: SHAP (auto explainer; configurable background size)
Robustness/Stress Tests: feature perturbations → delta-metrics
Model Metadata: model class, hyperparameters, features, training info

Optional CLI Flags

Most users just run tanml ui. These help on teams/servers:

# Share on LAN
tanml ui --public

# Different port
tanml ui --port 9000

# Headless (server/CI; no auto-open browser)
tanml ui --headless

# Larger limit (e.g., 2 GB)
tanml ui --max-mb 2048

Env var equivalents (Linux/macOS bash):

TANML_SERVER_ADDRESS=0.0.0.0 TANML_PORT=9000 TANML_MAX_MB=2048 tanml ui

Windows PowerShell:

$env:TANML_SERVER_ADDRESS="0.0.0.0"; $env:TANML_PORT="9000"; $env:TANML_MAX_MB="2048"; tanml ui

Defaults: address 127.0.0.1, port 8501, limit 1024 MB, telemetry OFF.

Templates

TanML ships DOCX templates (packaged in wheel & sdist):

tanml/report/templates/report_template_cls.docx
tanml/report/templates/report_template_reg.docx

Data Privacy

TanML runs locally; no data is sent to external services.
Telemetry is disabled by default (and can be forced off via --no-telemetry).
UI artifacts and reports are written under ./.ui_runs/<session>/ in your working directory.

Troubleshooting

Page didn’t open? Visit http://127.0.0.1:8501 or run tanml ui --port 9000.
Large CSVs are slow/heavy? Prefer Parquet; CSV → DataFrame can use several GB RAM.
Artifacts missing? Check ./.ui_runs/<session>/artifacts/.
Corporate networks: use tanml ui --public to share on LAN.

Contributing

We welcome issues and PRs!

Create a virtual environment and install dev extras:
- python -m venv .venv && source .venv/bin/activate (or \.venv\Scripts\activate on Windows)
- pip install -e .[dev]
Format/lint: black . && isort .
Run tests: pytest

Before opening a PR, please describe the change and include a brief test or reproduction steps where applicable.

License & Citation

License: MIT. See LICENSE.
SPDX-License-Identifier: MIT

How to cite

If TanML helps your work or publications, please cite:

Sah, T., & Sah, D. (2025). TanML: Automated Model Validation Toolkit for Tabular Machine Learning [Software]. Zenodo. https://doi.org/10.5281/zenodo.17317165

Or in BibTeX (version-agnostic):. Available at https://github.com/tdlabs-ai/tanml

Or in BibTeX (version-agnostic):

@software{tanml_2025,
  author       = {Sah, Tanmay and Sah, Dolly},
  title        = {TanML: Automated Model Validation Toolkit for Tabular Machine Learning},
  year         = {2025},
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.17317165},
  url          = {https://doi.org/10.5281/zenodo.17317165},
  license      = {MIT}
}

A machine-readable citation file (CITATION.cff) is included for citation tools and GitHub’s “Cite this repository” button.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github		.github
datasets		datasets
demo		demo
docs		docs
tanml		tanml
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

TanML: Automated Model Validation Toolkit for Tabular Machine Learning

Table of Contents

Why TanML?

Install

Quick Start (UI)

In the app

What TanML Checks

Optional CLI Flags

Templates

Data Privacy

Troubleshooting

Contributing

License & Citation

How to cite

About

Uh oh!

Releases 1

Packages

Languages

Uh oh!

License

Uh oh!

tdlabs-ai/tanml

Folders and files

Latest commit

History

Repository files navigation

TanML: Automated Model Validation Toolkit for Tabular Machine Learning

Table of Contents

Why TanML?

Install

Quick Start (UI)

In the app

What TanML Checks

Optional CLI Flags

Templates

Data Privacy

Troubleshooting

Contributing

License & Citation

How to cite

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages