Carto

Intent-aware codebase intelligence for AI assistants.

Carto scans your codebase, builds a layered semantic index using LLMs, and stores it in Memories for fast retrieval. It produces skill files (CLAUDE.md, .cursorrules) that give AI coding assistants instant, structured context about your project.

carto index .
# Scans 847 files across 3 modules in ~90 seconds
# Produces a 7-layer context graph stored in Memories
# Generates CLAUDE.md with architecture, patterns, and conventions

Quick Start

Prerequisites

Go 1.25 or later (with CGO support for Tree-sitter)
An LLM API key (Anthropic, OpenAI-compatible, or Ollama)
A running Memories server (default: http://localhost:8900)

Build

git clone https://github.com/divyekant/carto.git
cd carto/go
go build -o carto ./cmd/carto

Configure

export ANTHROPIC_API_KEY="sk-ant-api03-..."
# Memories server defaults to http://localhost:8900 -- override if needed:
# export MEMORIES_URL="http://your-memories-server:8900"

Run

# Index a codebase
carto index /path/to/your/project

# Query the index
carto query "How does authentication work?"

# Generate skill files for AI assistants
carto patterns /path/to/your/project --format all

How It Works

Carto builds understanding through a 5-phase pipeline that progressively layers meaning on top of raw code.

The Pipeline

Phase 1: Scan        Walks the directory tree, respects .gitignore,
                     detects module boundaries (go.mod, package.json, etc.)

Phase 2: Chunk       Tree-sitter AST parsing splits files into semantic chunks.
         + Atoms     fast-tier LLM produces structured atom summaries for each chunk.

Phase 3: History     Extracts git history (commits, churn, ownership).
         + Signals   Plugin-based external signals (tickets, PRs, docs).

Phase 4: Deep        deep-tier LLM analyzes cross-component wiring, identifies
         Analysis    business domain zones, and produces an architecture narrative.

Phase 5: Store       Serializes all 7 layers into Memories with source tags.
                     Saves a manifest for incremental re-indexing.

Layered Context Graph

Each layer captures a different dimension of understanding. Higher layers depend on lower ones.

Layer	Name	LLM	Description
0	Map	None	Files, modules, detected languages
1a	Atoms	Fast	Per-chunk summaries with intent and role annotations
1b	History	None	Git commits, file churn, ownership patterns
1c	Signals	None	External context from tickets, PRs, and other sources
2	Wiring	Deep	Cross-component dependency analysis
3	Zones	Deep	Business domain groupings and boundaries
4	Blueprint	Deep	System architecture narrative and design patterns

Tiered Retrieval

When querying, Carto returns context at three granularity levels:

Tier	Layers Included	Approximate Size
`mini`	Zones + Blueprint	~5 KB
`standard`	+ Atoms + Wiring	~50 KB
`full`	+ History + Signals	~500 KB

This lets AI assistants request just enough context for the task at hand -- a quick question needs mini, a refactoring task needs full.

CLI Reference

`carto index <path>`

Run the full indexing pipeline on a codebase.

carto index .                          # Index current directory
carto index /path/to/project           # Index a specific path
carto index . --incremental            # Only process changed files
carto index . --module my-service      # Index a single module
carto index . --project my-project     # Override the project name
carto index . --full                   # Force full re-index (ignore manifest)

Flag	Description
`--incremental`	Only re-index files that changed since the last run
`--module <name>`	Restrict indexing to a single detected module
`--project <name>`	Set the project name (defaults to directory name)
`--full`	Force a complete re-index, ignoring the manifest

`carto query <text>`

Search the indexed codebase using natural language.

carto query "How does the payment flow work?"
carto query "error handling" --project my-api --tier full
carto query "database migrations" -k 20

Flag	Description
`--project <name>`	Search within a specific project (enables tiered retrieval)
`--tier mini\|standard\|full`	Context tier for project-scoped queries (default: `standard`)
`-k <count>`	Number of results to return (default: `10`)

`carto modules <path>`

List all detected modules and their file counts.

carto modules .

Output shows each module's name, type (go, node, rust, etc.), path, and file count.

`carto patterns <path>`

Generate skill files that give AI assistants structured context about your codebase.

carto patterns .                       # Generate all formats
carto patterns . --format claude       # Generate CLAUDE.md only
carto patterns . --format cursor       # Generate .cursorrules only
carto patterns . --format all          # Generate both (default)

Flag	Description
`--format claude\|cursor\|all`	Output format (default: `all`)

`carto status <path>`

Show the current index status for a codebase.

carto status .

Displays the project name, last indexed timestamp, file count, and total indexed size.

Global Flags

carto --version                        # Print version
carto --help                           # Print help
carto <command> --help                 # Print help for a command

Configuration

Carto is configured entirely through environment variables. See .env.example for a complete template.

Variable	Required	Default	Description
`ANTHROPIC_API_KEY`	Yes	--	Anthropic API key or OAuth token
`MEMORIES_URL`	No	`http://localhost:8900`	Memories server URL
`MEMORIES_API_KEY`	No	--	Memories server API key
`CARTO_FAST_MODEL`	No	`claude-haiku-4-5-20251001`	Fast-tier model for atom analysis (Phase 2)
`CARTO_DEEP_MODEL`	No	`claude-opus-4-6`	Deep-tier model for deep analysis (Phase 4)
`CARTO_MAX_CONCURRENT`	No	`10`	Maximum concurrent LLM requests
`LLM_PROVIDER`	No	`anthropic`	LLM provider: `anthropic`, `openai`, `ollama`
`LLM_API_KEY`	No	--	API key for non-Anthropic providers
`LLM_BASE_URL`	No	--	Base URL for non-Anthropic providers

Authentication

Carto supports two authentication methods for the Anthropic API:

Standard API keys (sk-ant-api03-...) -- used with the X-Api-Key header
OAuth tokens (sk-ant-oat01-...) -- used with Authorization: Bearer header, with automatic token refresh

The authentication method is detected automatically from the key prefix.

Architecture

go/
  cmd/carto/              CLI entry point (Cobra commands)
  internal/
    analyzer/             Deep analysis (wiring, zones, blueprint)
    atoms/                Fast-tier atom summaries for code chunks
    chunker/              Tree-sitter AST chunking engine
    config/               Environment-based configuration loading
    history/              Git history extraction (commits, churn)
    llm/                  Multi-provider LLM client (Anthropic, OpenAI, Ollama)
    manifest/             Incremental indexing manifest (hash-based change detection)
    patterns/             Skill file generation (CLAUDE.md, .cursorrules)
    pipeline/             5-phase orchestrator wiring all components together
    scanner/              File discovery, .gitignore filtering, module detection
    signals/              Plugin-based external signal system (git, tickets, PRs)
    storage/              Memories REST client, layered storage, tiered retrieval
  web/                    React SPA dashboard (embedded in binary)

For the full architecture deep-dive, see docs/ARCHITECTURE.md.

Key Design Decisions

Tree-sitter for AST parsing -- provides language-aware chunking that respects function and class boundaries, rather than naive line-based splitting.
Two-tier LLM strategy -- The fast tier handles high-volume atom summaries (cheap), while the deep tier handles low-volume architectural analysis (thorough).
Layered storage with source tags -- each layer is stored with a structured source tag (carto/{project}/{module}/layer:{layer}) enabling precise retrieval and cleanup.
Manifest-based incremental indexing -- SHA-256 hashes track file changes so subsequent runs only process what changed.
Semaphore-based concurrency -- a configurable concurrency limit prevents overwhelming the LLM API with parallel requests.

Supported Languages

Carto recognizes and can parse files in the following languages. Tree-sitter grammars are bundled for the six primary languages marked below; all others are detected for file classification and included in the index as raw content.

Tree-Sitter AST Parsing

Language	Extensions
Go	`.go`
JavaScript	`.js`, `.jsx`, `.mjs`, `.cjs`
TypeScript	`.ts`, `.tsx`, `.mts`, `.cts`
Python	`.py`, `.pyi`
Java	`.java`
Rust	`.rs`

Language Detection (30+ languages)

Carto detects and classifies files across a broad set of languages including C, C++, C#, Kotlin, Ruby, Swift, Scala, PHP, Dart, Elixir, Erlang, Haskell, OCaml, Clojure, Lua, Zig, R, and more. It also recognizes configuration formats (JSON, YAML, TOML, XML, Protobuf, Terraform), web languages (HTML, CSS, SCSS, Vue, Svelte, GraphQL), documentation (Markdown, reStructuredText), SQL, and shell scripts.

Module Detection

Carto automatically identifies project boundaries by looking for manifest files:

Manifest	Module Type
`go.mod`	Go
`package.json`	Node.js
`Cargo.toml`	Rust
`pom.xml`	Java (Maven)
`build.gradle` / `build.gradle.kts`	Java (Gradle)
`pyproject.toml` / `setup.py`	Python

If no manifest files are found, the entire directory is treated as a single module.

Web UI

Carto includes a built-in web dashboard for browsing indexed projects, exploring modules, and querying the index visually.

carto serve --port 8950 --projects-dir /path/to/projects

Open http://localhost:8950 in your browser.

Docker

cd go
cp .env.example ../.env.example  # or use the root .env.example
docker compose up -d
# UI at http://localhost:8950

Or run directly:

docker build -t carto go/
docker run -p 8950:8950 \
  -e ANTHROPIC_API_KEY="sk-ant-api03-..." \
  -e MEMORIES_URL="http://host.docker.internal:8900" \
  -v /path/to/projects:/projects \
  carto

See go/docker-compose.yml for a complete multi-service setup.

Integrations

QUICKSTART-LLM.md -- LLM-friendly quickstart guide for AI assistants
Agent Write-Back -- How to keep the index current from Claude Code, Codex, Cursor, and OpenClaw

Contributing

Contributions are welcome. Please see CONTRIBUTING.md for guidelines on submitting issues and pull requests.

Running Tests

cd go
go test ./...

Building

cd go
go build -o carto ./cmd/carto

License

MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 147 Commits
.github/workflows		.github/workflows
docs		docs
go		go
integrations		integrations
src		src
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

divyekant/carto

Folders and files

Latest commit

History

Repository files navigation

Carto

Table of Contents

Quick Start

Prerequisites

Build

Configure

Run

How It Works

The Pipeline

Layered Context Graph

Tiered Retrieval

CLI Reference

carto index <path>

carto query <text>

carto modules <path>

carto patterns <path>

carto status <path>

Global Flags

Configuration

Authentication

Architecture

Key Design Decisions

Supported Languages

Tree-Sitter AST Parsing

Language Detection (30+ languages)

Module Detection

Web UI

Docker

Integrations

Contributing

Running Tests

Building

License

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

`carto index <path>`

`carto query <text>`

`carto modules <path>`

`carto patterns <path>`

`carto status <path>`

Packages