βΈ»
ModelAtlas is a forensic-grade, modular intelligence framework meticulously designed for parsing, enriching, auditing, and visualizing the ever-evolving landscape of foundational AI models.
Crafted for researchers, engineers, analysts, and agentic systems alike, it seamlessly bridges raw metadata with recursive enrichment and deep provenance trackingβcreating an inspectable, extensible, and trust-aware knowledge layer that empowers the open model ecosystem.
π‘ Trust. Trace. Transform.
pip install -r requirements.txt && playwright install
python enrich/main.py
python -m atlas search "llama"Create a .env file for API keys:
cp .env.example .env # or run `atlas init`The resulting .env file holds configuration keys such as
LLM_API_KEY, OPENAI_API_KEY, HUGGING_FACE_API_KEY, and the optional
PLAYWRIGHT_BROWSERS_PATH. Populate these values before running the
enrichment pipeline or CLI tools.
Run make test-deps if you need all packages for the test suite.
enrich/main.py runs the enrichment trace, and CLI commands reside in the atlas_cli/ package.
Build a container with all Python dependencies and Playwright browsers preinstalled:
docker build -t modelatlas .Launch an interactive shell inside the container:
docker run --rm -it modelatlasYou can then run the enrichment trace or CLI tools just as on the host system:
python enrich/main.py
atlas search "llama"βΈ»
The diagram below is generated from
architecture.mmd. You can edit it there to update the visual representation of the system.
flowchart TD
subgraph Trace [ModelAtlas Enrichment Trace]
A[π Ollama Scraper<br/>Collects raw metadata from Ollama.com] --> B[π¦ Raw Model Data]
B --> C[π§ RECURSOR-1<br/>Recursive Enrichment Agent]
C --> D[π Enriched JSON Records]
D --> E[π‘οΈ TrustForge<br/>Score models based on heuristic fusion]
D --> F[π TracePoint<br/>Lineage & Provenance Debugger]
D --> G[π AtlasView<br/>Visual Analytics Dashboard]
end
G --> H[π€ Developer / Analyst]
F --> H
E --> H
- Ollama Scraper: Harvests raw model data, including tags, manifests, and configuration files.
- RECURSOR-1: Normalizes fields, infers missing data, and leverages LLMs for comprehensive enrichment.
- TrustForge: Computes trust scores by fusing heuristic metrics from multiple data sources and now runs automatically inside the enrichment trace.
- TracePoint: Tracks enrichment lineage, prompt decision paths, and source deltas for transparent provenance.
- AtlasView: A web-based dashboard enabling search, filtering, comparative analysis, and visual audits.
βΈ»
- Enables powerful search across enriched model metadata fields.
- Supports embeddings, advanced filters, and fuzzy matching techniques.
- Example usage:
atlas search "open model for code completion"
- Aggregates and fuses diverse metrics including:
- License compliance and compatibility
- Download statistics and popularity
- Upstream lineage and provenance
- LLM-inferred risk assessments
- Produces a comprehensive
trust_scorefor each model.
- Parses manifests and configuration blobs to extract detailed metadata.
- Enriches attributes such as context length, base model lineage, quantization details, and architecture specifics.
- Suggests
tasks.ymlpatches to correct or complete missing fields. - Employs LLMs to intelligently infer and validate metadata where necessary.
- Enables deep inspection of any modelβs provenance by tracing:
- Original raw scrape data
- Config blob origins
- Step-by-step enrichment history
- Prompt decision trees and rationale
- Usage example:
tracepoint llama3:8b --lineage
- Developed with Tailwind CSS and Recharts for responsive and interactive visualizations.
- Presents:
- Model landscape visualizations by size, trust score, and license type
- Detailed lineage trees illustrating model ancestry
- Metadata completeness and quality indicators
Note: The dashboard code is currently under active development and is not yet included in this repository. It will be released in a future update.
modelatlas/
βββ atlas_cli/ # CLI Tool for semantic search and inspection
βββ enrich/ # Recursive enrichers and prompt injectors
βββ trustforge/ # Trust scoring engine and heuristics
βββ recursor/ # Autonomous enrichment agent logic
βββ tracepoint/ # Model inspection and audit trail tools
βββ dashboards/ # React-based frontend UI components
βββ data/
β βββ models_raw.json # Raw, unprocessed scrape data
β βββ models_enriched.json # Post-enrichment metadata output
βββ docs/
β βββ naming.md
β βββ schema.md
β βββ usage_examples.md
β βββ PHASE_2_DESIGN.md
βββ AGENTS.md
βββ tasks.yml
βββ README.md
βββ atlas.config.json
βΈ»
| Layer | Technology Stack |
|---|---|
| Backend | Python (requests, asyncio, typer) |
| Dashboard | React, Tailwind CSS, Recharts |
| LLMs | OpenAI, DeepSeek, Gemma, Ollama (local) |
| CLI UX | typer, rich, fuzzy search |
| Storage | JSON (canonical), YAML (tasks), Git |
| Agents | RECURSOR-1 + Codex-style patchers |
| DevOps | GitHub Actions, local runners |
βΈ»
# Run enrichment trace (includes trust scoring)
python enrich/main.py
# Perform semantic search for multilingual open-license models
atlas search "multilingual open license"
# Trace enrichment lineage for a specific model
tracepoint gemma:2b --lineage
# Launch the interactive dashboard locally
cd dashboards && npm run devAll HTTP requests made by the scrapers are cached in .cache/http.sqlite by default. Use --no-cache to disable caching.
Execute the test suite with pytest from the repository root:
pytestThis repository uses Git LFS to version large JSON artifacts. Run the following commands after cloning:
git lfs install
git lfs pullAll files in data/ and enriched_outputs/ are tracked via LFS, so new assets in these directories are stored automatically.
βΈ»
| Subsystem | Designation | Role Description |
|---|---|---|
| Full System | ModelAtlas | The overarching meta-system |
| Enrichment | Recursor | Autonomous recursive enrichment agent |
| Trust Engine | TrustForge | Assigns and computes trust scores |
| Lineage Tool | TracePoint | Debugs provenance and lineage |
| Dashboard | AtlasView | Frontend user interface |
| CLI | atlas-cli | Search and inspection command-line tool |
βΈ»
| Documentation File | Purpose |
|---|---|
AGENTS.md |
Details on enrichment agents, memory, and state logic |
tasks.yml |
Canonical task graph defining enrichment traces |
naming.md |
Naming philosophy and conventions |
schema.md |
Data schema specification for enriched model entries |
usage_examples.md |
Real-world CLI traces and usage patterns |
PHASE_2_DESIGN.md |
Design notes on manifest decoding, tag repair, and scoring implementation |
CODE_OF_CONDUCT.md |
Community expectations and enforcement policy |
SECURITY.md |
How to report vulnerabilities |
βΈ»
ModelAtlas is founded on these core principles:
- π Transparency over Obfuscation
- β»οΈ Recursive Enrichment is Integral, Not Optional
- π‘οΈ Trust Must Be Quantifiable and Measurable
- π§ LLMs Are Tools That Can Self-Improve and Assist
We hold that metadata is critical infrastructure, and that systems should be able to explain their own construction with clarity and rigor.
Map the modelscape. Trace the truth. Shape the future. π§ Welcome to the Atlas.
This project is licensed under the MIT License.