#

interpretability

Here are 24 public repositories matching this topic...

microsoft / responsible-ai-toolbox

Responsible AI Toolbox is a suite of tools providing model and data exploration and assessment user interfaces and libraries that enable a better understanding of AI systems. These interfaces and libraries empower developers and stakeholders of AI systems to develop and monitor AI more responsibly, and take better data-driven actions.

Updated Apr 29, 2026
TypeScript

neuronpedia

hijohnnylin / neuronpedia

open source interpretability platform 🧠

ai interpretability

Updated May 30, 2026
TypeScript

timbertrek

poloclub / timbertrek

Explore and compare 1K+ accurate decision trees in your browser!

visualization decision-tree interactive-visualizations interpretability rashomon

Updated Mar 4, 2024
TypeScript

webshap

poloclub / webshap

JavaScript library to explain any machine learning models anywhere!

visualization machine-learning transformer interpretability tensorflowjs

Updated Mar 29, 2023
TypeScript

utwente-dmb / xai-papers

survey interpretability explainable-ai explainable-ml interpretable-machine-learning paper-list

Updated Aug 4, 2024
TypeScript

mlgig / video-pose-tsc

deep-learning interpretability time-series-classification interpretable-machine-learning

Updated Feb 3, 2021
TypeScript

footnote

footnote-ai / footnote

AI that shows its work. Transparent, private, and easy to run yourself.

open-source ai self-hosted provenance transparency interpretability explainable-ai auditability

Updated May 30, 2026
TypeScript

csinva / imodels-playground

Demos for visualizing how rule-based models work.

rules data-science machine-learning ai interpretability xai imodels

Updated Dec 17, 2021
TypeScript

theomalaper / genomic-vep

Non-coding variant effect predictor using Nucleotide Transformer v2 with interpretability + autonomous optimization via autoresearch

bioinformatics genomics transformer variants clinvar nucleotides interpretability foundation-models

Updated Apr 13, 2026
TypeScript

Tuesdaythe13th / H4RB1NG3R

This repository represents the transition from behavioral safety to Neural Forensics. It provides the infrastructure to detect, audit, and mitigate high-order AI risks—such as Latent Deception, Sycophancy-Masking, and Synthetic Intimacy—directly at the mechanistic activation layer.

agent machine-learning safety forensic-analysis security-tools evaluation-framework interpretability interpretable-machine-learning responsible-ai mechanistic-interpretability agentic

Updated Jan 12, 2026
TypeScript

raxITlabs / nla-audit

An experiment in monitoring what LLMs are thinking. Current implementation reads activation-level thoughts via Anthropic's Natural Language Autoencoder release.

interpretability prompt-engineering chain-of-thought mechanistic-interpretability anthropic llm-observability agent-monitoring

Updated May 13, 2026
TypeScript

OpenInterpretability / web

Next.js site for OpenInterpretability — the umbrella org for mechreward and public hybrid-architecture SAEs

open-source research transformer ai-safety sae mlx interpretability sparse-autoencoder llm mechanistic-interpretability

Updated May 27, 2026
TypeScript

rayancheca / circuit-trace

Mechanistic interpretability lab that dissects transformer attention heads and traces reasoning circuits in real time.

react transformers pytorch three-js interpretability ai-ml mechanistic-interpretability

Updated Apr 21, 2026
TypeScript

harsha-gouru / llm-inspector

Educational tool to trace and visualize local LLM internals in real-time. See tokens, attention, hidden states, and probabilities.

visualization typescript educational attention interpretability llm

Updated Feb 14, 2026
TypeScript

MaxwellCalkin / compass

An interpretive lens over a collective alignment signal. Claude-powered Next.js app that synthesizes a written Constitution from a corpus of voted visions, identifies tensions, audits proposed model behaviors, and answers researcher queries. Companion to beacn.space.

nextjs alignment ai-safety interpretability claude ai-alignment llm rlhf anthropic constitutional-ai

Updated May 19, 2026
TypeScript

blockhead22 / CRT-GroundCheck-SSE

CORE/Aether — epistemic governance for AI agents. Belief substrate with trust math, contradiction detection, and a measured belief/speech gap. The model is the mouth; the substrate is the self.

python memory mcp interpretability ai-agents contradiction-detection llm ai-governance anthropic belief-substrate

Updated May 8, 2026
TypeScript

dakshjain-1616 / RepoTracker

Background tracker for the dakshjain-1616 repo portfolio. Monitors stars, forks, traffic, and README freshness across all repos, then surfaces a dashboard of which repositories need attention. Backed by the GitHub REST API and a SQLite store.

python nlp api machine-learning ai developer-tools interpretability llm

Updated Feb 28, 2026
TypeScript

mohinpatell / watch-a-model-grok

Scroll-driven visualization of a 1-layer transformer grokking (a+b) mod 113. After Nanda et al. 2023.

visualization nextjs pytorch transformer interpretability grokking mechanistic-interpretability

Updated Apr 22, 2026
TypeScript

Serbian-groundwork422 / PrintFit

Print Markdown to a single A4 page with auto-fit font sizing, live preview, and one-click PDF export

cli outreach youtube sdk sports transparency octoprint blackbox bias differential-privacy gradient-boosting interpretability shortvideo interpretable-machine-learning tiktok explainability interpretml chatgpt

Updated Jun 1, 2026
TypeScript

Wondermonger-daydreaming / latent-space-cartographer

A visual exploration tool for understanding and navigating latent spaces in machine learning models

machine-learning interpretability sparse-autoencoder

Updated May 31, 2026
TypeScript

Improve this page

Add a description, image, and links to the interpretability topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the interpretability topic, visit your repo's landing page and select "manage topics."