Skip to content

Automated validation toolkit for tabular ML models in finance and regulated domains.

License

tdlabs-ai/tanml

TanML: Automated Model Validation Toolkit for Tabular Machine Learning

PyPI Downloads License: MIT Give Feedback DOI

Watch the TanML Demo
🎥 Watch the 5-Minute TanML Walkthrough on YouTube →
(End-to-end demo of UI, validation checks, and automated report generation)

TanML validates tabular ML models with a zero-config Streamlit UI and exports an audit-ready, editable Word report (.docx). It covers data quality, correlation/VIF, performance, explainability (SHAP), and robustness/stress tests—built for regulated settings (MRM, credit risk, insurance, etc.).

  • Status: Beta (0.x)
  • License: MIT
  • Python: 3.8–3.12
  • OS: Linux / macOS / Windows (incl. WSL)

Table of Contents

  • Why TanML?
  • Install
  • Quick Start (UI)
  • What TanML Checks
  • Optional CLI Flags
  • Templates
  • Troubleshooting
  • Data Privacy
  • Contributing
  • License & Citation

Why TanML?

  • Zero-config UI: launch Streamlit, upload data, click Run—no YAML needed.
  • Audit-ready outputs: tables/plots + a polished DOCX your stakeholders can edit.
  • Regulatory alignment: supports common Model Risk Management themes (e.g., SR 11-7 style).
  • Works with your stack: scikit-learn, XGBoost/LightGBM/CatBoost, etc.

Install

pip install tanml

Quick Start (UI)

tanml ui

In the app

  1. Load data — upload a cleaned CSV/XLSX/Parquet (optional: raw or separate Train/Test).
  2. Select target & features — target auto-suggested; features default to all non-target columns.
  3. Pick a model — choose library/algorithm (scikit-learn, XGBoost, LightGBM, CatBoost) and tweak params.
  4. Run validation — click ▶️ Refit & validate.
  5. Export — click ⬇️ Download report to get a DOCX (auto-selects classification/regression template).

Outputs

  • Report: ./.ui_runs/<session>/tanml_report_*.docx
  • Artifacts (CSV/PNGs): ./.ui_runs/<session>/artifacts/*

What TanML Checks

  • Raw Data (optional): rows/cols, missingness, duplicates, constant columns

  • Data Quality & EDA: summaries, distributions

  • Correlation & Multicollinearity: heatmap, top-pairs CSV, VIF table

  • Performance

    • Classification: AUC, PR-AUC, KS, decile lift, confusion
    • Regression: R², MAE, MSE/RMSE, error stats
  • Explainability: SHAP (auto explainer; configurable background size)

  • Robustness/Stress Tests: feature perturbations → delta-metrics

  • Model Metadata: model class, hyperparameters, features, training info


Optional CLI Flags

Most users just run tanml ui. These help on teams/servers:

# Share on LAN
tanml ui --public

# Different port
tanml ui --port 9000

# Headless (server/CI; no auto-open browser)
tanml ui --headless

# Larger limit (e.g., 2 GB)
tanml ui --max-mb 2048

Env var equivalents (Linux/macOS bash):

TANML_SERVER_ADDRESS=0.0.0.0 TANML_PORT=9000 TANML_MAX_MB=2048 tanml ui

Windows PowerShell:

$env:TANML_SERVER_ADDRESS="0.0.0.0"; $env:TANML_PORT="9000"; $env:TANML_MAX_MB="2048"; tanml ui

Defaults: address 127.0.0.1, port 8501, limit 1024 MB, telemetry OFF.


Templates

TanML ships DOCX templates (packaged in wheel & sdist):

  • tanml/report/templates/report_template_cls.docx
  • tanml/report/templates/report_template_reg.docx

Data Privacy

  • TanML runs locally; no data is sent to external services.
  • Telemetry is disabled by default (and can be forced off via --no-telemetry).
  • UI artifacts and reports are written under ./.ui_runs/<session>/ in your working directory.

Troubleshooting

  • Page didn’t open? Visit http://127.0.0.1:8501 or run tanml ui --port 9000.
  • Large CSVs are slow/heavy? Prefer Parquet; CSV → DataFrame can use several GB RAM.
  • Artifacts missing? Check ./.ui_runs/<session>/artifacts/.
  • Corporate networks: use tanml ui --public to share on LAN.

Contributing

We welcome issues and PRs!

  • Create a virtual environment and install dev extras:
    • python -m venv .venv && source .venv/bin/activate (or \.venv\Scripts\activate on Windows)
    • pip install -e .[dev]
  • Format/lint: black . && isort .
  • Run tests: pytest

Before opening a PR, please describe the change and include a brief test or reproduction steps where applicable.


License & Citation

License: MIT. See LICENSE.
SPDX-License-Identifier: MIT

© 2025 Tanmay Sah and Dolly Sah. You may use, modify, and distribute this software with appropriate attribution.

How to cite

If TanML helps your work or publications, please cite:

Sah, T., & Sah, D. (2025). TanML: Automated Model Validation Toolkit for Tabular Machine Learning [Software]. Zenodo. https://doi.org/10.5281/zenodo.17317165

Or in BibTeX (version-agnostic):. Available at https://github.com/tdlabs-ai/tanml

Or in BibTeX (version-agnostic):

@software{tanml_2025,
  author       = {Sah, Tanmay and Sah, Dolly},
  title        = {TanML: Automated Model Validation Toolkit for Tabular Machine Learning},
  year         = {2025},
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.17317165},
  url          = {https://doi.org/10.5281/zenodo.17317165},
  license      = {MIT}
}

A machine-readable citation file (CITATION.cff) is included for citation tools and GitHub’s “Cite this repository” button.

About

Automated validation toolkit for tabular ML models in finance and regulated domains.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published

Languages