Stop burning money on LLM tokens.
Get 70-95% cost reduction through local RAG, intelligent routing, and containerized security.
LLMC is a local-first RAG (Retrieval Augmented Generation) engine and intelligent router designed to drastically reduce the cost of using Large Language Models with your codebase.
Instead of sending your entire codebase to Claude or GPT-4, LLMC indexes your code locally, finds the exact relevant snippets (functions, classes, docs), and sends only what matters.
graph LR
A[User Query] --> B(LLMC Router);
B --> C{Local Index};
C -->|Search| D[Relevant Context];
D -->|Trim & Pack| E[Optimized Prompt];
E --> F[LLM API];
F --> G[Answer];
style B fill:#f9f,stroke:#333,stroke-width:2px
style E fill:#bfb,stroke:#333,stroke-width:2px
Get up and running in seconds.
# One-line install
curl -sSL https://raw.githubusercontent.com/vmlinuzx/llmc/main/install.sh | bash
# Or via pip
pip install "git+https://github.com/vmlinuzx/llmc.git#egg=llmcwrapper[rag,tui,agent]"cd /path/to/your/project
llmc-cli repo register .# Search without using ANY tokens
llmc-cli search "authentication middleware"
# Launch the visual dashboard
llmc-cli tui| Feature | Description |
|---|---|
| 💸 Massive Savings | Reduces token usage by 70-95% by sending only relevant context. |
| 🔒 Security First | New in v0.7.0: "Hybrid Mode" for trusted clients (host access) vs. Container Isolation for untrusted LLMs. |
| 🧠 Polyglot RAG | Smart parsing (TreeSitter) for Python, TS, JS, Go, Java, and technical docs. |
| 🕸️ GraphRAG | Understands your code structure (imports, calls, inheritance) to find related files automatically. |
| 🖥️ TUI Dashboard | Terminal UI to monitor indexing, search results, and costs. |
| 🔌 MCP Support | Full Model Context Protocol server to integrate seamlessly with Claude Desktop. |
🛠️ Core RAG Engine
- Local SQLite Index: Stores text + metadata without external dependencies.
- Smart Embeddings: Caches embeddings to avoid re-computing unchanged files.
- Context Trimmer: Packs the most relevant spans into a fixed token budget.
- Enrichment: Uses small local models to tag and summarize code for better retrieval.
🛡️ Security & MCP
- Hybrid Mode: Trusted clients get direct host access (~76% cheaper than docker overhead).
- Container Isolation: Untrusted inputs run in Docker/nsjail.
- Defense in Depth: Even if an LLM is "jailbroken" by prompt injection, it can't escape the container.
📊 Analytics & Routing
- Intelligent Failover: Cascades from Local → Cheap Cloud → Premium Models.
- Cost Tracking: Hard budget caps to prevent surprise bills.
- Rate Limiting: Automatic token bucket throttling for API providers.
Full documentation is available in the DOCS/ directory:
- Getting Started — Installation and quickstart
- User Guide — Configuration and daily usage
- Operations — Running the daemon and MCP integration
- Architecture — System design and internals
- Reference — CLI, config, and MCP tool reference
Originally created by David Carroll (the worst paragliding pilot in the TX Panhandle) after burning through his weekly API limits in days. This tool was born from the necessity to code more while spending less.
We welcome PRs! Please check CONTRIBUTING.md before starting.
- Fork the repo
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Current Release: v0.9.1 "Back From Vacation"