GitHub - lutelute/local-cli: Local-first AI coding agent CLI powered by Ollama (Python stdlib only, zero dependencies)

Local-first AI coding agent. Zero dependencies. Runs entirely on your machine.

Download · Features · Desktop App · CLI Usage · Configuration

AI agent autonomously creates a 2048 game — write tool in action

AI builds a 2048 solver with 3 strategies and benchmarks them

3 AI strategies compared: Random vs Heuristic vs Lookahead — game over side by side

What is this?

Local CLI is an AI coding agent that runs locally using Ollama. It can read, write, and edit files, run shell commands, search code, and fetch web pages — all through natural language.

It also supports Claude API as an alternative provider, with seamless runtime switching between local and cloud models.

Think of it as a local, offline-capable alternative to cloud-based AI coding assistants.

Features

Agent Loop

The LLM autonomously calls tools to complete tasks. It reads files, writes code, runs commands, and iterates until the task is done — no manual step-by-step prompting required.

Deterministic Harness

A single unified loop (shared by the CLI, server, web monitor, and sub-agents) wraps the model with deterministic interventions that repair the failure modes of small local models — so a 1-9B model can sustain agentic sessions that would otherwise need a frontier model:

Intervention	What it fixes
Text tool-call rescue	Models that print their tool call as text instead of a structured call still act — `<tool_call>` tags, fenced JSON, bare JSON, inlined arguments (`{"name": "write", "file_path": ...}`), tool-name-as-key (`{"write": {...}}`), and Python call syntax (`write(file_path=...)`)
No-tool-support fallback	Models whose endpoint rejects `tools` entirely (e.g. Japanese-specialized models) are taught a fenced-JSON call format and driven by text — they still work as agents
Tool-name / argument repair	Near-miss names (`write_file` → `write`, `run` → `bash`) and keys (`path` → `file_path`) are resolved instead of erroring
Loop detection	Repeated identical calls draw a corrective reminder, then a forced wrap-up — no more infinite retry loops
Post-write verification	`.py`/`.json`/`.toml` files are syntax-checked immediately after write/edit, and unresolved merge-conflict markers are flagged in files of any type; errors are fed straight back to the model
Finish guards	An empty reply, or finishing right after a failed tool call — including a bash command that exited non-zero (`[exit code: N]`) — draws one deterministic push-back instead of ending the turn half-done
Edit recovery hints	A failed `edit` shows the closest matching block from the file (with line numbers) so the next attempt copies the exact text
Todo staleness reminders	A half-finished todo list is re-surfaced so multi-step work is not silently abandoned
Step limit	After `max_iterations` the model gets one tool-free turn to summarize instead of running forever
Overload retry	HTTP 503s retry with exponential backoff
Context compaction	Truncation (default) or LLM summarization (`compact_mode=summarize`) with automatic fallback

10 Built-in Tools

Tool	Description
`bash`	Run shell commands (with dangerous command blocking)
`read`	Read file contents with line numbers
`write`	Create or overwrite files (with path validation)
`edit`	Find-and-replace editing
`glob`	Find files by pattern (`.py`, `/.ts`)
`grep`	Search file contents with regex
`web_fetch`	Fetch and parse web pages
`ask_user`	Ask the user a question
`todo_write`	Track a structured task list (pending / in-progress / done)
`agent`	Spawn sub-agents for parallel task execution

Multi-Provider

Ollama — Local inference, no API key, full privacy
Claude API — Anthropic's cloud models (Opus, Sonnet, Haiku)
Switch providers at runtime with /provider command or the desktop UI

Model Management

40+ curated models across 6 categories: Code, General, Small, Reasoning, Japanese, Multilingual
Live search from ollama.com with filters (tools, vision, thinking, code)
Install, delete, and switch models from CLI or desktop app
Interactive TUI model picker (/models or --select-model)

RAG Engine

Index your codebase for context-aware responses. Uses SQLite + embeddings with automatic re-indexing on file changes.

Git Checkpoints

Create tagged snapshots before risky edits. Roll back instantly with /rollback.

Session Persistence

Save conversations as JSONL files. Resume where you left off.

Security

Dangerous command blocking (rm -rf / and its variants, fork bombs, dd to a device, etc.)
Risky-command confirmation — recursive rm, sudo, force push, kill, shutdown, etc. prompt for approval in the REPL (pass --yes to auto-approve)
Environment sanitization (strips API keys, tokens from subprocesses)
Path traversal prevention
Ollama host validation (localhost only)

Desktop GUI

Electron app with terminal-style UI, model picker, file explorer, and settings panel.

Zero Dependencies

Python stdlib only. No pip install needed for the core CLI.

Mascot — Loca 🐈

An optional terminal companion. Pass --mascot (or LOCAL_CLI_MASCOT=cat) and the spinner becomes Loca, the local cat, blinking on one line while it thinks:

  (=･ω･=)  Thinking...      (=-ω-=)  blink      (=･ω-=)  wink

Pass --mascot pixel for an animated pixel-art Loca — a five-row cat sprite (orange fur, pink ears and cheeks, big highlighted eyes, an ω mouth) that blinks and twitches its ears via ANSI cursor control. It automatically falls back to the one-line face when output is piped or not a TTY, so cursor codes never end up in your logs. Pure decoration, default off, still zero-dependency.

Download

Desktop App (pre-built)

Download the latest release from GitHub Releases:

Platform	File
macOS (Apple Silicon)	`Local CLI-x.x.x-arm64.dmg`
Windows	`Local CLI Setup x.x.x.exe`
Linux	`Local CLI-x.x.x.AppImage`

Ollama must be installed and running on your machine.

macOS: "App is damaged" warning

The app is not code-signed. To allow it:

xattr -cr /Applications/Local\ CLI.app

Or: System Settings > Privacy & Security > Open Anyway.

CLI (from source)

# Requirements: Python 3.10+, Ollama, Git
git clone https://github.com/lutelute/local-cli.git
cd local-cli

# Run directly
python -m local_cli

# Or install as a command
pip install -e .
local-cli

CLI Usage

Quick Start

# Default model (qwen3.5:9b-q4_K_M)
local-cli

# Choose a model at startup
local-cli --select-model

# Use a specific model
local-cli --model qwen3:8b

# Enable RAG for codebase-aware responses
local-cli --rag --rag-path ./src

# Use Claude API
export ANTHROPIC_API_KEY=sk-ant-...
local-cli --provider claude

Slash Commands

Command	Description
`/help`	Show available commands
`/model <name>`	Switch model
`/models`	Open interactive model selector (TUI)
`/provider [name]`	Switch or show LLM provider
`/status`	Show connection and model info
`/install <model>`	Download a model from Ollama registry
`/uninstall <model>`	Delete a model
`/info <model>`	Show model details and capabilities
`/running`	List models currently loaded in VRAM
`/checkpoint [msg]`	Create a git checkpoint
`/rollback [tag]`	Roll back to a checkpoint
`/save`	Save current session
`/brain [model]`	Set orchestrator brain model
`/registry`	Show task-to-model routing
`/update`	Check for and install updates
`/agents`	List background sub-agent status
`/plan`	Show, create, or manage structured plans
`/ideate`	Enter brainstorming / ideation mode
`/knowledge`	Save, load, or list knowledge items
`/skills`	List or show discovered skills
`/clear`	Clear conversation
`/exit`	Quit

See docs/prompts.md for copy-paste prompt examples.

Skills

Local CLI has a skills system that auto-injects contextual instructions based on trigger keywords. Create SKILL.md files in .agents/skills/ to encode team conventions, framework guides, or domain knowledge.

.agents/skills/
├── django-api/
│   └── SKILL.md      # triggers: [django, REST API, DRF]
└── code-review/
    └── SKILL.md      # triggers: [review, PR, code quality]

See docs/skills.md for the full guide.

Desktop App

Terminal-style GUI with streaming chat, model management, and file browsing.

Features

Streaming chat with real-time tool call display
Thinking indicator — see when the AI is processing
Model picker — Catalog (curated) + Discover (live search from ollama.com)
Provider switching — Toggle between Ollama and Claude
File explorer — Browse project files in the sidebar
File viewer — Preview files without leaving the app
Settings panel — App and backend updates, keyboard shortcuts
Copyable output — Select and copy any text from the terminal
Stop generation — Interrupt AI responses mid-stream

Keyboard Shortcuts

Shortcut	Action
`Cmd/Ctrl + ,`	Settings
`Cmd/Ctrl + B`	Toggle file explorer
`Escape`	Stop generation / Close dialog
`Shift + Enter`	New line in input
`Enter`	Send message

Auto-Update

The desktop app updates automatically on startup:

Checks GitHub Releases for new versions
Downloads the update in the background
Closes the app, replaces itself, and relaunches — zero user interaction

Manual update is also available from the Settings panel (Cmd/Ctrl + ,).

Run from Source

cd desktop
npm install
npm run dev          # Development mode (hot reload)

Build Installers

cd desktop
npm run build        # Build for current platform
npm run build:mac    # macOS (.dmg + .zip)
npm run build:win    # Windows (NSIS installer)
npm run build:linux  # Linux (AppImage + .deb)

Configuration

Configuration is resolved in order: CLI flags > environment variables > config file > defaults.

Flag	Env Var	Default	Description
`--model`	`LOCAL_CLI_MODEL`	`qwen3.5:9b-q4_K_M`	Model to use
`--provider`	`LOCAL_CLI_PROVIDER`	`ollama`	LLM provider
`--debug`	`LOCAL_CLI_DEBUG`	`false`	Debug output
`--rag`	—	`false`	Enable RAG
`--rag-path`	—	`.`	Directory to index
`--rag-topk`	—	`5`	RAG results per query
`--rag-model`	—	`all-minilm`	Embedding model
`--select-model`	—	`false`	Interactive model picker
`--server`	—	`false`	JSON-line server mode
`--yes` / `-y`	—	`false`	Auto-approve risky commands (skip confirmation)
`--update`	—	`false`	Check for updates now (git pull + reinstall)
`--auto-update`	`LOCAL_CLI_AUTO_UPDATE`	`false`	Install available updates automatically on startup, then restart
—	`LOCAL_CLI_COMPACT_MODE`	`truncate`	Context compaction: `truncate` or `summarize`
—	`LOCAL_CLI_MAX_ITERATIONS`	`40`	Agent step limit per turn (`0` = unlimited)
`--mascot [style]`	`LOCAL_CLI_MASCOT`	`off`	Loca the local cat: `--mascot` for the one-line face `(=･ω･=)`, `--mascot pixel` for animated pixel art (TTY only; falls back to the face in pipes)

Config file location: ~/.config/local-cli/config (key=value format).

Claude API

Set the ANTHROPIC_API_KEY environment variable to enable Claude as a provider:

export ANTHROPIC_API_KEY=sk-ant-api03-...
local-cli --provider claude

Switch at runtime with /provider claude or /provider ollama.

Recommended Models

Model	Size	Best For
`qwen3:8b`	5.2 GB	General use, tool calling
`qwen2.5-coder:7b`	4.7 GB	Code generation
`qwen3:30b`	18.5 GB	Complex reasoning
`deepseek-r1:14b`	9.0 GB	Chain-of-thought
`gemma3:12b`	8.1 GB	Multilingual, Japanese
`qwen3:0.6b`	0.5 GB	Quick testing

Agent-quality guidance (measured with scripts/harness_eval.py): tool-trained models from ~4B up complete multi-step agent tasks reliably, in English and Japanese (qwen3.5:4b scored 7/7 on the eval suite). Sub-1B models handle simple create/run tasks but fail multi-step edits even with the harness pushing back. Chat-specialized models without tool training (e.g. Japanese conversation models) run via the text-driven fallback and manage single tool calls, but tend to go silent mid-task — prefer tool-trained models for real agent work.

Architecture

local-cli/
├── local_cli/
│   ├── __main__.py              # Entry point (decomposed startup steps)
│   ├── agent.py                 # Unified agent loop (run_agent + emitters)
│   ├── harness.py               # Deterministic harness interventions
│   ├── cli.py                   # REPL + slash commands
│   ├── config.py                # Configuration (CLI > env > file > defaults)
│   ├── server.py                # JSON-line server for desktop GUI
│   ├── ollama_client.py         # Ollama REST API client
│   ├── orchestrator.py          # Multi-provider orchestration
│   ├── model_catalog.py         # 40+ curated models + cache
│   ├── model_search.py          # Live search from ollama.com
│   ├── model_manager.py         # Install / delete / info
│   ├── model_registry.py        # Task-to-model routing
│   ├── model_selector.py        # Interactive TUI picker
│   ├── rag.py                   # RAG engine (SQLite + embeddings)
│   ├── git_ops.py               # Git checkpoint / rollback
│   ├── session.py               # Session persistence (JSONL)
│   ├── security.py              # Input validation + sanitization
│   ├── updater.py               # Self-update (git pull)
│   ├── sub_agent.py             # Sub-agent runner (thread pool)
│   ├── plan_manager.py          # Structured plan management
│   ├── knowledge.py             # Persistent knowledge store
│   ├── skills.py                # Skill discovery & matching
│   ├── providers/
│   │   ├── base.py              # Abstract LLMProvider
│   │   ├── ollama_provider.py   # Ollama adapter
│   │   ├── claude_provider.py   # Claude API adapter
│   │   ├── message_converter.py # Format normalization
│   │   └── sse_parser.py        # SSE streaming parser
│   └── tools/                   # 10 agent tools
│       ├── bash_tool.py         # Shell execution
│       ├── read_tool.py         # File reading
│       ├── write_tool.py        # File creation
│       ├── edit_tool.py         # String replacement
│       ├── glob_tool.py         # File pattern search
│       ├── grep_tool.py         # Content search (regex)
│       ├── web_fetch_tool.py    # URL fetching
│       ├── ask_user_tool.py     # User prompts
│       ├── todo_tool.py         # Structured task tracking
│       └── agent_tool.py        # Sub-agent spawning
├── desktop/                     # Electron + React + Vite
│   ├── electron/                # Main process + preload
│   ├── src/                     # React UI components
│   └── build/                   # App icons
├── tests/                       # 2254 tests
└── pyproject.toml               # Zero dependencies

Tests

python -m pytest tests/ -q
# 2254 passed

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 212 Commits
.auto-claude/specs/007-extend-orchestration-feature-to-match-claude-code-		.auto-claude/specs/007-extend-orchestration-feature-to-match-claude-code-
.claude/commands		.claude/commands
assets		assets
desktop		desktop
docs		docs
local_cli		local_cli
scripts		scripts
tests		tests
.gitignore		.gitignore
README.easy-ja.md		README.easy-ja.md
README.ja.md		README.ja.md
README.md		README.md
e2e_verify.mjs		e2e_verify.mjs
hello.py		hello.py
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

What is this?

Features

Agent Loop

Deterministic Harness

10 Built-in Tools

Multi-Provider

Model Management

RAG Engine

Git Checkpoints

Session Persistence

Security

Desktop GUI

Zero Dependencies

Mascot — Loca 🐈

Download

Desktop App (pre-built)

macOS: "App is damaged" warning

CLI (from source)

CLI Usage

Quick Start

Slash Commands

Skills

Desktop App

Features

Keyboard Shortcuts

Auto-Update

Run from Source

Build Installers

Configuration

Claude API

Recommended Models

Architecture

Tests

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 20

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages